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ABSTRACT 


The  most  common  method  ot  creating  and  loading  a  new  database  is 
to  write  a  program  usinq  the  host  language  macros  the  database 
management  system  provides.  As  tor  all  software  production,  the  cost  of 
writing  this  program  is  high,  particularily  considering  it  may  be 
executed  only  once.  The  "Source  to  S2K  (system  2000)  Conversion  system" 
will  generate  a  FURTKAN  program  which  will  load  the  described  source 
file  into  the  described  System  2000  database.  The  user  inputs  these 
file  descriptions  ana  the  source  to  target  mapping  transformations.  The 
system's  design  is  based  on  a  common  architecture  developed  through 
research  of  seven  current  conversion  system  implementations.  This 
report  will  present  this  architecture,  detail  the  design  and  languages 
ot  the  "Source  to  S2K  Conversion  System"  and  comment  on  its 
implementation.  Appendicies  include  a  User's  Manual,  and  examples  of 
generated  command  files.  The  system  has  been  implemented  in  PASCAL  in  a 
Control  Data  6000  series  computing  environment. 
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CHAPTER  1 
Introduction 

1.1  Statement  Of  Problem 

An  application  which  stores  data  under  the  control  of  a  database 
management  system  (dms)  must  initially  "load"  its  data.  The  process  of 
loading  this  data  can  be  viewed  as  a  data  conversion  problem  --  how  to 
convert  the  raw  application  data  from  its  present  form  and  format  to  one 
which  is  required  by  the  DMs.  Most  established  DMSs  provide  two  initial 
load  capabilities  110] .  The  first  is  an  automatic  load  function. 
Typically  this  requires  the  raw  data  to  be  in  a  specific  format,  usually 
with  delimiters  surrounding  each  field  value.  In  addition,  the  load 
function  usually  requires  more  computer  processing  time  to  load  the  same 
amount  of  data  than  the  second  conversion  method  --  a  user  written 
program.  The  user  written  program  utilizes  the  DMS's  tile  manipulation 
macr ocommands  in  one  of  several  host  languages.  A  user  written  program 
also  enables  the  user  to  include  validation  and  conversion  routines. 
These  routines  rarely  are  part  of  the  DMS's  load  function.  Even  though 


there  are  major  advantages  in  processor  time  and  flexibility,  the  user 
written  program  is  expensive  to  produce,  especially  if  it  is  executed 
only  once.  A  method  is  needed,  therefore,  which  will  allow  a  simple, 
flexible,  and  cost  effective  means  of  converting  initial  load  data  into 
its  underlying  DMS  data  structure  without  requiring  any  special  data 
formatting  or  high  software  production  costs. 

1.2  Keport  Objectives 

The  objectives  of  this  report  are  to  document  the  design, 
implementation  and  correct  usage  of  the  "Source  To  S2K  conversion 
System".  This  system  is  a  solution  to  the  previously  stated  problem, 
it  is  a  simple,  flexible  system  which  allows  automatic  generation  of 
initial  load  programs  for  MRI's  System  2000  (S2K)  database  management 
system.  Because  it  generates  a  complete  FORTRAN  program,  it  has  tne 
advantages  of  efficient  processor  utilization,  validation  and  conversion 
routine  capabilities,  and  no  special  formatting  of  the  input  data. 
Because  the  program  is  generated  from  a  small  amount  of  user  input,  it 
is  also  cost  effective,  compared  to  writing  the  program  by  hand. 

As  all  systems,  however,  the  Source  To  S2K  Conversion  system  has 
its  limitations.  The  generated  program  can  read  in  only  one  input 
source  tile  at  a  time.  Subsequent  programs  can  be  generated  which  will 
allow  updating  of  the  initial  database,  but  from  only  one  tile  per 
program.  The  target  database  must  be  an  S2K  defined  database.  Because 
S2K  supports  a  hierarchical  data  model,  the  data  transformations  from 
tne  input  source  file  to  the  target  database  are  also  based  on  a 
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hierarchical  data  model.  As  a  consequence  of  this  design,  the  input 
source  file  must  be  describable  in  a  hierarchical  manner.  All 
validation  and  conversion  routines  must  be  written  by  the  user  in  ansi 
Standard  KOHTRAN.  within  these  limitations,  however,  lie  a  great  number 
of  source  tile  to  target  S2K  database  conversion  capabilities. 


1.3  design  and  Implementation  Objectives 

The  design  and  implementation  objectives  of  the  Source  To  S2K 
Conversion  System  were  not  to  develop  new  approaches  or  methodologies. 
Tne  objectives  were  to  study  existing  data  conversion  implementations, 
qleaning  from  them  the  required  components  and  functions  of  a  conversion 
system,  design  the  source  to  S2K  system  based  on  this  research,  and 
finally,  implement  the  system  using  disciplined,  structured  software 
engineering  principles.  In  order  to  properly  document  how  these 
oojectives  were  accomplished,  the  remainder  of  Chapter  1  will  discuss  a 
common  architecture  tor  data  conversion  systems.  This  architecture  was 
developed  from  the  study  of  seven  different  data  conversion  system 
implementations,  based  on  this  common  architecture,  the  Source  To  S2K 
Conversion  System  was  designed.  Chapter  2  will  report  this  design  and 
details  of  its  implementation.  Appendix  A  is  the  system's  User's 
Manual.  it  also  contains  execution  instructions  and  a  complete  example. 
Appendix  b  is  an  example  of  generated  UT-2D  command  files  needed  to 


execute  the  system 
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1.4  A  Common  Architecture  For  Data  Conversion  Systems 


Seven  different  data  conversion  implementations  were  studied  to 
find  tneir  common  functions  and  components.  The  implementations  studied 
were:  IBM's  EXPRESS  System,  1977  [19];  SDC's  CODS  System,  1975  [3,1«J; 
University  of  Michigan's  Data  Translation  Prolect,  1976  lb, 11,121;  J.A. 
Ramirez's  (University  of  Pennsylvania)  Conversion  System,  1974  [15,16); 
CUDASXL  Stored  Data  Definition  and  Translation  ( SDDT )  Task  Group 
C()bUL-lo-NlPS/360  Prototype,  197  3  C 7 J ;  Honeywell's  File  Translator 
Prototype,  1975  11J;  and  the  ASAP-TO-REL  System  (University  of 
Pennsy lvania) ,  1975  [2,17].  Although  the  seven  systems  studied  aitter 
in  purpose,  basic  approach  and  architecture,  they  all  contained  the  same 
functional  components.  These  components  have  been  grouped  into  three 
sections:  Definition,  Logic,  and  Execution  (see  figure  1).  The 
Definition  section  is  composed  of  the  definition  languages  the 
conversion  system  uses.  Since  most  systems  use  the  hierarchical  data 
model,  the  Data  Definition  Language  (DDL)  used  to  describe  the  source 
and  target  flies  looks  much  like  a  COBOL  Data  Definition.  The  reguired 
steps  of  restructuring  the  source  file  to  produce  the  target  file  are 
usually  contained  in  the  "conversion"  language.  Any  special  source 
translation  and  value  conversions  may  appear  in  the  DDL  (seen  in  the 
Ramirez  system)  or  in  the  conversion  language  (seen  in  EXPRESS). 


A  transition  function  between  the  Definition  section  and  the 
Logic  section  is  the  Language  Processor.  The  DDL  and  Conversion 
language  statements  must  be  parsed  and  checked  for  syntax.  Some  systems 
(CUDS),  have  elegant  semantic  analysers  which  guard  against  ambiguous 
conversion  statements  and  redundant  or  impossible  constructs.  All 
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Basic  Components 
System. 


ot  a  Generalized  Data  Conversion 


systems  mast  combine  the  Information  gained  from  the  definition  and 
conversion  statements  and  build  a  series  of  symbol  tables.  These  symbol 
tables  may  be  used  in  both  the  Logic  and  Execution  sections. 

Tne  Logic  section  performs  three  functions:  what  format  to  reaa 
tne  source  tile,  what  is  required  to  restructure,  translate  and  convert 
each  source  recoro,  and  wnat  format  to  write  the  target  file.  The  rean 
and  write  logic  takes  information  from  the  symbol  tables  ana  tile 
descriptions  to  determine  the  structure  of  the  recora  and  the  location 
of  specific  tielcs.  An  example  of  this  is  the  logic  required  to  read  a 
record  containing  a  repeating  group.  Some  tvoe  of  wHlLL  LOOP  would  nave 
to  De  executed  (or  generated,  tor  non-lnteroretive  systems)  until  a 
qiven  delimiter  had  been  received. 

rne  Conversion  logic  is  more  difficult.  If  the  conversion 
language  is  procedural  (a  small  set  of  primitive  conversion  functions;, 
the  conversion  logic  is  usually  a  set  of  generalized  procedures 
corresponding  to  the  conversion  functions.  This  is  the  case  for  LXPRLSS 
ano  cuus.  Lon-procedural  conversion  languages  rely  on  derivina  tne 
conversion  required  by  comparing  the  source  and  target  DDLs.  Any 
complex  restructuring  is  accomplished  by  user  written  procedures  (seen 
in  Ramirez's  System).  tor  this  approach,  the  conversion  loaic  is 
reduced  to  simple  mapping  and  proper  procedure  binding. 

The  Execution  section  contains  the  functions  performed  during 
tne  actual  conversion.  detore  describing  this,  the  interpretive  vs 
generative  approach  must  oe  explained.  Generative  systems  generate 
programs  which,  when  comoiled,  will  perform  the  conversion  desired. 
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Interpretive  systems  determine  how  to  convert  each  record  and  perform 
tne  conversion  all  in  the  same  step.  Thus,  interpretive  systems  nave 
their  Logic  and  Execution  sections  combined.  A  generative  system  would 
have  a  compiler  cetween  the  Logic  and  Execute  sections. 


Tne  Execute  section  performs  the  actuai  conversion.  This 

includes  reading  the  file,  writing  it  in  an  intermediate  source  format, 
converting  the  file  to  an  intermediate  target  format,  and  finally, 
l  writing  tne  taroet  file.  Let  us  now  look  at  the  details  of  several 

implementations  and  see  how  they  map  to  the  common  architecture  just 

ft 

presented.  The  specific  details  studied  will  be  the  purpose  of  each 
£  system,  tne  number  of  tiles  it  can  handle,  the  data  model  it  uses,  and 

whetner  it  generates  code  or  is  interpretive.  These  attributes,  for  tne 

* 

£  seven  systems  studied,  are  summarized  in  figure  2. 


1.5  Comparison  (if  Implementations 


ft 

i 


ft 
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The  number  and  type  of  files  a  conversion  system  win  support  is 
greatly  influenced  by  its  purpose.  For  example,  the  purpose  of  CUDS  is 
to  convert  a  source  oatacase  to  a  target  database,  using  tne  source  and 
target  database  management  systems  to  do  the  storage  and  physical  level 
conversion.  The  Eamirez  System  converts  a  source  seguential  file  to  a 
target  seguential  tile.  His  system  can  not  use  a  database  file,  nor  can 
C  -IDS  convert  a  file  without  the  DMS .  The  ASAP-TO-REL  system  converts 
flat  tiles  produced  by  ASAP  (a  sequential  file  management  system)  to 
relational  database  load  strings  in  REL  (Relational  English)  format. 
Thus,  its  purpose  is  to  aliow  a  subset  of  a  very  large  sequential  file 
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to  be  converted  into  a  relational  database  and  aueried  against  by  tne 
system  rel  12).  Although  the  purposes  and  tiles  of  these  data 
conversion  systems  vary  greatly,  they  functionally  are  the  same.  hacn 
system  reduces  the  source  files  to  one  consolidated  intermediate  tile. 
The  format  of  this  file  is  known.  The  target  file’s  format  is  known, 
and  a  single  target  file  is  produced.  IBM's  EXPRESS  further  illustrates 
this  process.  EXPRESS  can  convert  multiple  input  files,  database 
supported  or  not,  into  multiple  output  files.  It  utilizes  a  "Peader" 
step  to  read  all  source  files  into  a  single  intermediate  tile.  The 
conversion  step  converts  the  source  intermediate  tile  into  a  single 
intermediate  target  file.  The  "writer"  step  then  writes  out  this 
intermediate  target  tile  into  the  size  and  format  the  user  wants.  Thus, 
although  tne  purpose,  number  and  type  of  tiles  differ  amona  systems, 
they  tunctionally  execute  the  same. 

The  data  model  used  by  a  system  will  affect  its  Data  Description 
Language  IDOL)  and  restructuring  languaae  more  than  its  functional 
architecture.  As  previously  mentioned,  most  systems  use  a  hierarchical 
model.  The  rationale  is  that  this  model  is  familiar  to  the  user  (CUBUL 
programmer),  the  restructuring  dynamics  are  well  known  [1JJ,  and  the 
major  commercial  database  management  systems  support  the  model. 
Michigan  university's  Data  Translation  Prolect,  however,  uses  a 
relational  data  model.  The  relational  model  allows  for  a  normal  form  of 
data  tor  its  intermediate  source  and  data  file.  This  allows  tor  more 
general  and  efficient  conversion  lb).  If,  for  instance,  a  relational 
source  file  neeos  converting  to  a  hierarchical  target  file,  the 
conversion  would  require  a  total  reading  of  the  source  file  before  the 
conversion  began.  However,  it  the  majority  of  the  time  the  system  is 
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converting  hierarchical  to  hierarchical  files,  it  probably  would  not  be 
efficient  to  use  the  Michigan  approach,  in  summary,  the  hierarchical 
model  is  tne  most  popular  model  supported.  If  non-hierarchical  files 
neea  converting,  either  a  complete  read  step  is  required  (as  done  by  the 
Michigan  System  and  EXPRESS)  or  software  support  outside  the  conversion 
system  (as  in  CODS)  is  required. 

The  final  feature  to  discuss  is  whether  the  conversion  system  is 
interpretive,  or  whether  it  generates  code  to  be  compiled.  Although  the 
approaches  are  clearly  different,  the  logic  required  in  the  systems  are 
ti.e  same.  with  the  interpretive  approach,  the  system  programs  are 


generalized , 

whereas 

for 

the  generator 

approach,  the 

method 

of 

constructing 

programs 

is 

generalized. 

In 

both  cases  the 

logic. 

for 

instance,  to 

read  all 

ot  the 

occurrences 

of 

a  particular 

record 

vs 

generating  the  ccae  to  do  the  same  function  requires  the  same  amount  ot 
knowledge  about  the  record  and  its  structure.  Thus,  the  read  step  for 
an  interpretive  system  reads  the  source  record,  while  tne  generative 
system  produces  the  code  to  read  the  source  record.  functionally,  and 
logically,  they  are  the  same.  There  are,  however,  some  run  time 
differences  between  the  two  approaches.  Interpretive  systems  must 
execute  logic  usirg  the  DDL  to  determine  how  to  read,  convert,  and  write 
each  recora.  Generative  systems  perform  the  reads,  conversion,  and 
writes  directly  since  a  specific  program  has  been  qenerated  and  compiled 
to  do  such.  Better  run  time  efficiency  can  be  expected  from  the 
generative  system  due  to  the  direct  execution.  There  is,  however,  the 
system  overhead  ot  creating  and  compl lino  the  generated  program. 
Literature  on  actual  performance  comparisons  is  not  known. 
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1.6  Common  Architecture  Details 

t 

It  has  oeer  shown  that  the  common  architecture  is  a  valid 
representation  of  the  required  functions  and  components  of  a  data 
conversion  system.  This  section  will  examine  each  functional  component 
(  in  more  detail.  Examples  from  the  studied  implementations  will  pe  used 

to  illustrate  functional  component,  specifics. 
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1.6.1  The  Definition  Section  Functions 

A  data  description  language  ( DDT. )  must  be  capable  of  descriDing 
the  structure  of  the  source  and  target  files,  (a  hierarchical  mooel  will 
De  assumed  tor  this  discussion).  Shoshani  [18]  describes  three  levels 
of  oata  structure  description:  logical,  storage,  and  physical.  The 
logical  description  itemizes  the  entities  of  the  record,  the  relations 
among  them,  and  the  size  and  type  of  the  fields.  The  storage  level 
describes  such  thinqs  as  file  indexing  organizations,  access  paths ,  and 
fixed  or  varitle  length  records.  Finally,  physical  level  descriptions 
indicate  how  and  where  data  is  to  be  read  and  written,  such  as  device 
type,  blocKsize  and  lapel  information.  If  the  conversion  system 
converts  the  storage  and  physical  level  as  well  as  the  logical  level, 
the  DDL  must  have  the  capability  of  describina  all  three  levels.  This 
is  the  type  of  DDL  the  Ramirez  and  Michigan  systems  use.  CODS,  on  the 
other  hand,  uses  the  source  and  target  database  management  system 
facilities  to  perform  the  storage  and  physical  conversion.  The  CODS  DDL 
is  therefore  much  smaller  and  simpler.  If  the  source  and  target  storaqe 
and  physical  levels  are  fixed  [but  not  necessarily  the  same),  the  DDL, 
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again,  would  not  have  to  describe  all  three  levels.  This  is  the  case 
with  the  ASAP-T0-F1L  system  where  the  input  is  always  an  ASAP  tile  and 
tne  output  is  always  Relational  English  Language  strings. 
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There  are  two  basic  approaches  tor  describing  conversion 
specifications  --  procedural  and  non-procedural.  The  non-procedural 
approach  requires  the  user  to  describe  the  source  file,  the  desired 
target  file  arc  the  translation  rules.  Michigan's  Translation 
Definition  Lanouage  ( TDL )  is  an  example  of  the  non-procedural  appr oach . 
The  procedural  approach  requires  the  user  to  specify,  in  terms  of  the 
conversion  language  primitives,  the  specific  steps  reauired  to  enact  tne 
conversion  ana  the  order  in  which  to  execute  them.  Examples  of 
procedural  conversion  languages  are  EXPRESS'  "CONVEFT",  and  CODS' 
"CDTL".  Proponents  of  the  non-pr ocedural  approach  believe  it  is  less 
restrictive  and  easier  to  use  (15].  The  procedural  languages  proponents 
believe  it  is  more  powerful,  efficient  and  direct  [18], 

The  CODS  Common  Data  Translation  Language  (CDTL)  is 
representative  of  the  procedural  conversion  languages  studied,  it 
consists  of  eleven  primitive  operations  (EXPRESS'  "CONVERT "  has  nine 
primitives).  The  primitives  describe  the  basic  data  transformations 
required  to  restructure  hierarchical  data  model  structures,  plus  varied 
validation  and  conversion  capabilities.  The  data  transformation 
operations  are  of  three  types:  1.)  moving  values  across  on  the  same 
level,  2.)  movinq  oata  values  down  and  repeating  them  in  each  of  its 
members,  and  3.)  performing  an  operation  on  a  set  of  lower  level  values 
and  moving  this  new  single  value  up,  or  moving  a  specific  occurrence  of 
a  lower  level  value  up.  Details  on  the  meanings  of  these 


transformations  are  in  Appendix  A  --  User's  Manual. 

The  final  function  in  the  Description  section  is  the  parsing, 
syntax  checking  and  symbol  taple  building.  The  literature  qives  little 
detail  on  these  implementations.  It  is  assumed  that  basic  compiler 
principles  are  used. 

1.6.2  The  Logic  Section  Functions 

when  discussing  the  next  two  sections  the  reader  is  reminded 
that  the  interpretive  and  generative  systems  will  differ  sliqhtly.  Tne 
interpretive  system  will  execute  the  code  corresponding  to  the  logic  it 
just  performed.  The  generative  system  will  autpul.  high  level  code 
corresponding  to  the  logic  it  just  performed. 

The  read  function  is  usually  implemented  by  traversing  the  data 
descriptions  and  previously  built  symbol  tables.  As  each  field  is 
parsed,  a  position  in  the  input  buffer  is  filled.  For  systems  requiring 
storage  conversion,  the  read  function  must  have  a  subroutine 
corresponding  to  each  possible  access  method.  For  physical  level 
conversion  most  systems  take  advantage  of  the  operating  system  they  are 
executed  on  by  merely  settinq  appropriate  file  attributes.  This  may  be 
done  dynamically  tor  interpretive  systems,  or  in  the  generated  Job 
Control  Language  (JCL)  tor  generative  systems. 

Tne  conversion  logic  is  implemented  differently  based  on  the 
procedural/non-procedural  character istic  of  its  conversion  language,  as 


previously  discussed.  CdDS  uses  the  CDTL  statements  and  the  CDDL  symDol 
tables  to  build  a  conversion  table,  Each  table  entry  consists  of  tne 
primitive's  ID  number  and  the  relative  address  of  the  source  and  target 
fields.  During  execution  (CUDS  is  interpretive)  each  conversion  table 
entry  is  executed  by  a  CASE  statement  usinq  the  primitive  number  as  the 
Key.  The  Famirez  system  uses  a  non-procedural  conversion  language 
(DML).  It  is  a  generative  (non-interpretive)  system.  Its 
Implementation  reauires  the  user  to  specify  the  maximum  number  of 
occurrences  any  repeating  group  may  have.  The  strategy  is  to  build  the 
source  and  target  record  buffers  large  enough  to  hold  the  largest 
possible  source/tar  get  record.  During  execution  the  read  function 
expanas  the  source  record  Into  a  large  fixed  format  record.  The 
conversion  function  will  then  execute  the  data  transformation  operations 
from  the  source  input  buffer  to  the  target  output  buffer.  Input 
validations  or  special  conversions  must  be  written  by  the  user  in  pL/i 
procedures  ana  subnitted  as  part  of  the  conversion  statements. 

Some  systems  (A5AP-T0-FEL)  use  tne  operating  system  to  perform 
"value"  conversions,  such  as  Hollerith  to  EBCDIC  code  conversion.  Other 
systems  (tXPPESS  and  the  Michigan  systems)  perform  the  conversions 
themselves,  CODS  has  a  separate  language,  Common  Format  Definition 
Language  (CFDL),  and  a  separate  functional  component  which  performs  the 
"value"  conversions.  Most  systems  support  table  look-up  value 
translations ,  but,  obviously,  the  user  is  required  to  fill  the  table 
(for  interpretive  systems)  or  write  the  translation  subroutine  (tor 


generative  systems) 
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1 .  t>.  i  The  Execution  Section  Functions 

The  first  function  during  execution  is  to  read  the  source  file. 
It  the  system  can  handle  several  files,  most  Implementations  read  all  of 
the  tiles  and  combine  them  into  a  single  intermediate  source  file.  This 
is  done  uy  the  Michigan,  EXPRESS,  and  CQDS  systems.  It  is  not  necessary 
to  read  the  entire  source  file  pefore  converting.  The  Ramirez  ana 
ASaP-TO-REL  systems  read  a  source  record,  convert  it,  and  write  the  new 
target  record  out  one  at  a  time.  These  systems  usually  can  handle  only 
one  input  source  file  and  are  guaranteed  it  will  be  in  a  specific 
storage  format,  Ci.e.  sequential  file  with  variable  length  records). 

The  implementation  of  the  conversion  step  is  usually  motivated 
by  etficiency  factors.  The  number  of  1/D  operations  must  be  Kept  to  a 
minimum  as  well  as  memory  to  memory  data  moves.  EXPRESS  implements  a 
"pipelining"  technique  to  increase  its  efficiency.  The  Michigan  system 
has  been  making  efforts  towards  bypassing  the  conversion  step  for 
records  which  do  not  require  conversion,  taqqreqate  schema  facility). 
Most  of  the  "minor"  implementations  have  not  introduced  any  significant 
etficiency  features  and  execute  the  conversion  step  quite  straight 
forwaraly. 


A  final  comment  should  be  made  on  execution  flexibility, 
flexibility  in  this  sense  means:  1.)  the  ability  to  handle  the  hard  to 
describe,  very  unusual  conversion  requirement,  and  2.)  the  ability  to 
execute  the  conversion  in  incremental  steps.  The  generative  systems 
usually  allow  more  flexibility  in  regard  to  handling  the  unusual 
conversion  case.  This  is  because  the  generated  code  can  usually  be 
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accessed  and  modified  prior  to  its  execution.  EXPRESS  produces  separate 
read,  conversion  and  write  PL/1  procedures  tor  each  job.  During 
execution  the  EXPRESS  system  calls  these  procedures  based  on  the 
conversion  phase  it  is  in  and  the  data  beinq  operated  on.  The  Pamirez 
system  produces  a  complete,  self-contained  PL/I  program.  The  execution 
step  is  conducted  completely  free  of  any  conversion  system  support.  The 
EXPRESS  system  could  be  difficult  to  alter,  particularly  if  the  aesirea 
Change  was  in  the  control  portion  of  the  program.  The  Ramirez  system, 
however,  would  be  much  easier  to  modify  since  it  is  a  complete, 
self-contained  program.  The  advantage  of  the  Ramirez  self-contained 
program  is  also  a  disadvantage  in  terms  of  incremental  step  execution. 
The  only  way  to  break-up  the  Ramirez  conversion  is  to  stop  its  execution 
and  rely  on  some  "restart"  mechanism  to  start  it  at  a  later  time.  Otner 
systems,  such  as  EXPRESS,  Michigan  and  CODS,  allow  separate  reading, 
converting  ana  writing  of  the  tiles  to  be  converted.  with  this 
flexibility,  the  conversion  can  run  even  though  a  large  block  ot 
computer  time  is  not  available. 

1.7  Summary  of  Common  Architecture 

Based  on  the  examination  of  seven  data  conversion 
implementations,  the  common  functions  of  a  data  conversion  system  have 
oeen  identified.  These  include  a  DDL  to  describe  the  source  and  target 
tiles,  a  conversion  language  to  describe  the  source  to  target  field 
mappings,  and  read,  conversion  and  write  modules.  Differences  in  DDLs 
were  found  to  be  based  on  the  data  model  the  language  used,  and  how  many 
data  structure  levels  it  converted  (logical,  storage,  and  physical). 
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Conversion  language  differences  arose  depending  on  whether  the  language 
approach  was  procedural  or  non-procedural.  Read  and  write  module 
differences  were  based  on  how  many  source/tarqet  files  the  system  could 
handle.  Conversion  module  implementations  differed  due  to  the 
procedural/non-procedural  language  approach,  and  efficiency  factors 
introduced.  Kinally,  whether  the  system  too*  an  interpretive  or 
generative  approach  appeared  to  affect  its  output  (converted  records  or 
a  conversion  program)  more  than  its  architecture,  based  on  this  common 
architecture,  the  source  to  S2K  Conversion  System  was  designed. 
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CHAPTER  2 

DESIGN  AND  IMPLEMENTATION 


This  chapter  will  discuss  the  design  of  the  Source  to  S2K 
Conversion  Systerr  and  document  its  software  implementation.  The  design 
discussion  * i 1 1  follow  the  organization  of  the  common  conversion  system 
architecture,  as  presented  in  Chapter  1.  The  implementation  discussion 
will  present  the  general  software  organization,  major  data  structures, 
and  itemize  the  main  procedures,  their  functions,  inputs  ana  outputs. 

2.1  System  Description 

The  Source  to  S2K  Conversion  System  design  lent  itself  well  to  a 
"top-down"  development.  The  system's  purpose,  to  convert  source  files 
to  S2K  aat abases,  was  well  defined.  Because  the  S2K  system  provides  a 
conversion  facility  throuqh  execution  of  a  Program  Languaqe  Interlace 
tPLIJ  program,  generating  a  new  program  tor  each  conversion  job  appeared 
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to  ne  the  oest  approach.  Using  the  hierarchical  data  model  also  was  a 
natural  choice  since  the  target  file  would  always  be  an  S2K  dataoase. 
in  order  to  simplify  the  implementation,  the  number  of  source  files  was 
restricted  to  one,  as  was  the  number  of  different  target  databases. 
Generating  a  PLI  FORTRAN  program  was  decided  over  generating  a  PLI  COBOL 
program  due  to  local  support.  Thus,  starting  with  the  purpose  of  the 
system  and  some  basic  decisions,  the  design  of  the  system  developed.  It 
would  ta<e  as  input  a  description  of  the  source  file,  S2K  database,  ana 
conversion  mappings,  and  produce  a  PLI  FORTRAN  program  which,  when 
executed,  would  perform  the  actual  conversion.  Figure  3  shows  this 
design,  design  details  of  the  system's  Definition,  Logic,  and  Execution 
sections  are  now  presented. 

2.2  Definition  section  Design 

Languages  had  to  be  designed  which  allowed  the  user  to  input  the 
necessary  information  needed  to  generate  the  FORTRAN  program.  These 
languages  included  one  to  descrioe  the  source  file,  one  to  describe  the 
mappings  between  the  source  and  target  files,  and  a  third  for 
miscellaneous  system  input.  A  special  target  description  language  was 
not  necessary  as  the  required  S2K  database  description  input  could  be 
used. 

2.2.1  Source  File  Definition  Language  Design 

As  discussed  in  Chapter  1,  there  are  three  levels  of  data 
structure  that  must  be  described:  logical,  storage,  and  physical. 
Since  the  Source  to  S2K  system'  has  a  limited  scope,  extensive 
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Rigare  3.  Lata  flo*  of  a  source  to  S2K  Conversion  System  job. 
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description  capability  for  all  levels  was  not  necessary.  Specifically, 
the  uT-2u  operating  system  has  no  direct  means  of  describing  tile 
storage  character istics ,  other  than  stating  the  file  is  "local"  or 
"foreign".  secondly,  since  the  target  tile  is  an  S2K  database,  commonly 
storeo  on  disk,  the  physical  conversion  requirements  will  be  small. 
Therefore  the  storage  and  physical  level  descriptions  can  be  simple, 
consisting  of  keywords  followed  by  user  input.  For  example, 

FILE  =  I  MPU '1/1234/9876. 

DEVICE  =  DISK. 

indicates  the  input  source  tile  name  is  "INPUT"  and  it  resides  on 
permanent  disk  library  number  "1234"  (password  "9876"). 

Tne  source  file  logical  description  is  also  simple,  due  to  two 
restrictions.  First,  the  file  must  De  describable  in  a  hierarchical 
manner.  Secondly,  since  the  UT-2D  storage  structure  capabilities  are 
limited,  all  source  records  must  be  fixed  length.  This  implies  all 
repeating  groups  have  a  defined  maximum  number  of  times  they  may  repeat. 
The  logical  description  is  thus  reduced  to  field  names,  field 
soecitications ,  ana  tne  maximum  number  of  times  a  group  may  repeat.  The 
field  names  consist  of  the  letter  "S"  followed  by  an  integer,  starting 
with  one,  increasing  by  one  for  each  new  field.  A  comment  field  is 
provided  to  make  the  field  name  more  meaninqful  (i.e.  "S3  A20.  Company 

tame.").  Since  FOPTPAN  FOPMaT  statements  will  be  generated  from  the 
source  input,  the  field  specifications  use  the  same  notation  as  the 
format  statements.  An  example  of  a  logical  structure  description  is 
given  in  figure  4.  More  examples  may  be  found  in  the  User's  Manual, 


Section  2.C 
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figure  4.  source  Description  Language  Example. 

2.2.2  Conversion  Language  Design 

A  procedural  lanauage  approach  was  taken  for  the  conversion 
language.  Eased  on  trie  systems  studied,  it  appeared  to  be  the  least 
amoiguous  tor  the  user  ana  easiest  to  implement.  Seven  primitives  were 
designed,  each  corresponding  to  either  a  data  transf ormation  operation, 
a  conversion  or  valioation  operation,  or  the  special  STORE  operation. 
Codr's  conversion  languaqe,  [  1 H  J  ,  strongly  influenced  this  design.  CQuS 
is  a  dms  to  PMS  conversion  system,  requiring  the  source  and  target  DMSs 
to  handle  all  physical  and  storage  structure  conversions.  its 
conversion  language  primitives  are  concerned  only  with  the  loqical  level 
conversion,  and  focus  on  the  three  basic  hierarchical  model  data 
transformations  needed  to  map  source  to  target  data  structures.  Tnese 
transformations  are  discussed  in  Chapter  1,  Section  l.b.i,  ana  the 
User's  sanual.  Section  2.E. 

Along  with  the  data  transformation  primitives,  conversion 
language  primitives  for  validation  and  conversion  were  also  designed. 


The  conversion  primitive  allows  the  user  to  write  FORTRAN  code  which 


will  oe  included  in  the  generated  program.  This  code  should  perform  a 
unique  conversion  on  one,  or  several,  source  fields  to  produce  a  single 
tarqet  value.  The  validation  primitive  allows  the  user  to  input  FORTRAN 
code  tor  tne  purpose  of  validating  a  particular  input  source  field.  The 
user  also  specifies  an  option  that  execution  should  take  (reject 
validated  field  or  reject  data  set  occurrence)  should  the  validation 
tail.  The  valioation  primitive  is  a  feature  not  seen  in  any  of  the 
implementations  studied.  Ramirez's  system  allows  users  to  input  PL/1 
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the  source  data,  the  validation  primitive  is  an  important,  practical 
feature . 


The  final  conversion  primitive  is  the  special  STORE  operation. 
The  user  is  expected  to  input  a  data  transformation  primitive  for  eacn 
target  field  in  the  order  the  fields  are  defined.  After  the 
transtormat ion  for  the  last  field  in  a  particular  group  is  input,  the 
primitive  STORE  must  be  input.  This  specifies  to  the  system  that  all 
target  fields  for  this  qroup  have  been  "filled"  and  the  new  data  set 
should  be  written.  A  data  transformation  primitive  tor  the  first  target 
field  of  the  next  group  should  then  be  input.  The  last  input  tor  this 
group  should,  again,  be  followed  by  a  STORE  primitive.  This  process 
snoula  be  continued  until  the  end  of  the  defined  target  database  is 
reached.  Further  details  ot  the  conversion  language  and  examples  are 
contained  in  the  User's  Manual,  Section  2.E.  Figure  5  contains  a 


summary  ot  the  conversion  language  primitives. 

2.2.J  Miscellaneous  System  input 

information  on  tne  S2K  database  file  name  and  several  system 
options  were  neeoed  to  complete  tne  generated  FORTRAN  program'  and 
generated  UT-2D  command  files.  A  keyword  followed  by  user  input  format 
was  designed  to  give  the  user  this  input  capability.  For  example 

RUN  =  S 

is  an  option  cara  specifying  the  run  is  for  syntax  only.  The  proper 
input  to  specify  the  run  is  a  full  generation  run  is 

RUN  =  F . 

All  of  the  key  words  and  user  input  options  are  discussed  in  Section  2.H 
of  tne  User's  Manual. 

<;.3  Logic  S--tlop  Design 

For  generative  conversion  systems,  the  logic  section  is  where 
the  conversion  program  is  generated.  Using  the  user's  input,  read, 
conversion,  and  write  modules  must  be  generated.  in  addition  other 
required  code  must  be  generated,  such  as  database  schema  ano  local 
declarations ,  opening  and  closing  ot  the  database,  ana  error  detection 
procedures.  This  required  code  is  fairly  static,  requiring  little 
change  irom  job  to  job.  me  read,  conversion  and  write  modules  are  tar 
more  dynamic  and  require  more  complex  aiqorithms.  Their  design  will  oe 


discussed  here 
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PRIMITIVE 


NAME 

TYPE 

FUNCTION 

DIRECT 

TRANSFORMATION 

The  transtormat ion  used  to  move  source 

to  target  fields  that  are  in 
correspondence. 

REPEAT 

TRANSFORMATION 

The  transformation  used  to  move  a 
source  field  in  an  ancestor  data 

set  to  a  target  field. 

LEVELUP 

TRANSFORMATION 

The  transformation  used  to  move  a 
specific  occurrence  of  a  source  field 
in  a  subordinate  set  to  a  taraet 

field. 

• 

*1 

UPUP 

TRANSFORMATION 

The  transf ormation  used  to  apply  an 
operation  aaalnst  all  occurrences 
ot  a  source  field  in  a  subordinate 
set.  The  results  of  the  operation 
are  moved  to  the  target  field. 

CuN VERSION 

USER  WRITTEN 

Signals  the  input  ot  a  user  written 
FORTRAN  module.  The  module  will 
perform  a  conversion  on  one  or  several 
source  fields. 

VALIDATE 

USER  WRITTEN 

Signals  the  input  ot  a  user  written 

E  OR  TRAN  module  and  instructions 
tor  execution  should  the  validation 
module  return  a  "false"  value. 

S  I  ORE 

SPECIAL 

Signals  the  end  ot  the  conversion 

Drl'Htives  tor  the  target  data 
set  being  built. 


Figure  5.  Summary  ot  the  7  conversion  Language  primitives. 


t 


2b 


2.3.1  The  Read  Module  Design 

The  purpose  of  the  read  module  Is  to  read  a  complete  source 
record  and  separate  each  field  so  It  can  be  individually  moved  to  a 
target  field.  These  two  operations  could  be  accomplished  by  a  EQRTRAN 
formatted  read,  but  this  statement  restricts  the  source  input  to  150 
cnaracters.  Since  this  restriction  is  unacceptable,  an  unedited  EORTRAN 
read  statement  is  used  to  read  the  source  record  and  several  DECODE 
statements  are  used  to  separate  the  fields.  The  number  of  words  read  by 
tne  unedited  read  is  calculated  from  the  source  input  description.  The 
decode  statements  will  separate  the  fields  from  the  input  buffer  and  put 
them  in  a  temporary  array,  one  field  tor  each  array  word.  Since  the 
decode  statement  also  has  a  150  character  limit,  several  statements  may 
be  necessary.  After  execution  of  the  unedited  read  and  decode 
statements,  each  source  field  resides  in  a  separate  array  word  and  can 
be  directly  addressed.  During  the  parsing  of  the  input  source  tile 
description,  a  symbol  table  is  filled  which  maps  the  source  field  names 
and  their  corresponding  temporary  array  addresses.  For  example, 
consider  a  source  file  consisting  of  the  field  DAD  (18  characters),  a 
repeating  group  CHILDREN  (max=2,  each  10  characters),  and  a  repeating 
group  RETS  (max=3,  each  8  characters)  within  the  qroup  CHILDREN.  The 
source  name  to  temporary  array  location  mappings  are  shown  in  figure  6. 
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SOUK  CL  FILE  DESCRIPTION 


I  DAD  |  CHILDREN  FG  I 


CHILD  I  PFTS  RG  I 


PET-NAME  I 


SOURCE  N AML  TO  ARRAY  LOCATION  MAPPING 


Field  Name  Array 

DAD  1 

CHILI  »1  3 

PET-NAME  *1  4 

*2  b 

#3  6 

CHILD  #2  7 

P  E  T  -  N  A  M  E  #  l  H 

#2  9 

*3  10 


Address 


*■  igure  6.  txanple  of  source  tile  to  temporary  array  mapping. 

CHILD  l  starts  ir  location  3  instead  ot  2  because  DAD  is  greater  than 
10  characters.  Array  word  1  and  2  are  used  to  store  the  dad  field.  ine 
symbol  table  does  not  itemize  each  field  occurrence  and  its 
corresponding  tenporary  array  address,  as  shown  in  figure  6.  Father, 
tne  address  of  the  first  occurrence  of  each  field,  the  number  of  words 
between  tne  first  and  second  occurrences,  and  the  maximum  number  ot 
occurrences  is  stored.  This  information  is  passed  down  from  each  level 
to  its  subordinate  levels.  The  symbol  table  for  the  previous  example 
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would  be 


Field  Name 

1st  Occ 

-----  level 

1 

Size  between 

1 - 

i 

Max  Occ. 

-----  level 

1 

Size  Between 

2  -■ 

Max 

1 

Occ  . 

DAD 

1 

0 

0 

0 

0 

CHILD 

3 

4 

2 

0 

0 

PfcT-NAMfc 

4 

4 

2 

1 

3 

Addresses  tor  CHILD  are  tne  original  (3)  and  the  original  plus  the  size 
between  occurrences  (3+4=7).  Addresses  tor  PLT-NAMt  are  the  original 
(4)  plus  tne  size  between  occurrences  tor  level  2  (4+1  =  5,  5  +  l=t>)  and  the 
same  iteration  tor  tne  second  occurrence  of  level  1  (4+4=8,  8+1=9, 
9+1=10).  This  algorithm  is  used  when  generating  the  conversion 
assignment  statements. 

In  summary,  the  read  module  logic  consists  of  generating  an 
unedited  read  statement  to  move  an  entire  source  record  into  an  input 
cutter.  Decode  statements  are  then  generated  which  convert  each  field 
to  its  proper  internal  representation,  and  moves  the  value  to  a 
temporary  array  word.  The  previously  build  symbol  table  allows 
retrieving  the  proper  temporary  array  word  for  any  occurrence  of  any 
source  field. 

2.3.2  The  Conversion  Module  Design 

The  conversion  module  is  responsible  for  generating  the  FORTRAN 
code  tor  tne  oata  transformations  and  val idation/conversion  procedures. 
Since  the  user  is  responsible  for  the  validation  and  conversion  code, 
tne  only  action  the  system  takes  is  to  replace  the  source  field 
references  with  their  proper  temporary  array  locations.  This  is  done 


using  the  symbol  table  mappinqs  built  durinq  the  source  definition 
parsing,  and  a  set  ot  indexes,  one  index  corresponding  to  each  source 
data  group.  The  value  of  the  indexes  represents  the  "current" 
occurrence  ot  its  corresponding  group,  by  computing 

INDEX  =  <orig.  pos.>  ♦  (<curr  index  1 >*<qroup 1  slze>)  +•  ... 
for  dll  groups  the  field  in  question  is  subordinate  to,  the  correct 
temporary  array  subscript  is  found.  This  computation  statement  is 
generated  before  each  source  field  reference.  Then  the  field  name  is 
replaced  with  the  temporary  array  name,  subscripted  by  the  variable 
INDEX  ti.e.  TEMPI  INDEX}). 

Tne  data  transformation  algorithms  must  also  generate  similar 
statements  for  all  source  field  references,  before  a  source  data  value 
is  moved  to  a  target  field,  INDEX  must  be  computed.  Then  the  value  ot 
the  temporary  array,  subscripted  by  INDEX,  is  moved  to  the  target  field. 
Ihe  other  task  the  data  transformation  algorithms  must  accomplish  is 
generating  proper  looping  statements.  These  statements  are  needed  so 
that  the  data  transformations  are  executed  for  all  source  field 
occurrences.  The  DIHECT  data  transformation  (moving  values  on  the  same 
level)  requires  a  loop  for  the  group  the  source  field  is  in,  plus  a  loop 
for  each  aroup  the  source  field  is  subordinate  to.  Consider  the  DAD, 
CHILDREN,  PETs  data  structure  in  figure  b  as  a  source  tile,  and  the 
target  is  a  "PETS"  dataoase,  one  pet  per  record.  In  order  to  address 
all  of  the  pets  contained  in  a  single  input  record,  the  CHILDREN  group 
must  be  looped  through  as  well  as  the  PETS  group.  Thus,  this  example 
would  require  generation  of  two  FORTRAN  do  loops. 


The  REPEAT  data  transformation  (movinq  upper  level  values  down! 
requires  no  additional  loop  statement  generation.  The  current 
occurrence  of  the  parent  group  will  contain  the  correct  source  field 
value.  Using  the  PAD,  CHILDREN ,  PETS  example  aaain,  consider  moving  the 
CHlLu  Iname)  into  the  target  "pet"  record.  The  proper  occurrence  of  the 
CHILDREN  group  must  be  used.  Since  the  previous  DIRECT  statement 
generated  a  loop  for  the  CHILDREN  group,  the  proper  index  is  guaranteed. 
The  argument  for  this  is  the  following.  it  a  source  value  is  Deiny 
moveo  "aown"  to  a  field  in  a  target  group,  the  target  group  must  have  a 
corresponding  source  group.  At  least  one  field  in  this  corresponding 
source  group  must  be  moved  to  the  target  group  using  the  DIRECT 
transformation.  Since  DIRECT  generates  loops  tor  all  groups  above  it, 
the  parent  group  the  REPEAT  refers  to  will  be  properly  incremented. 

The  UPCP  Up  Operation)  data  transformation  is  designed  to 
perform  an  operation  on  all  field  values  contained  in  a  subordinate 
group.  Here  again  loops  must  be  generated  for  the  group  itself  plus  all 
groups  superior  to  it  uc  to  the  group  level  which  called  the 
transformation  (the  DIRECT  group  level).  Consider  the  previous  example, 
cut  tms  time  the  tarqet  database  is  a  "DADs"  database  rather  than  a 
"PETs"  database.  In  this  case  the  source  level  0  fields  would  be  moved 
to  the  target  level  0  fields  using  the  DIRECT  transformation.  Consider 
a  target  field  defined  "NUM-pETS-OwNED" ,  with  the  desire  to  store  in 
eacn  dad's  target  record  the  number  of  pets  he  owns.  A  loop  tor  the 
children  group  as  well  as  the  PETS  group  must  be  generated  in  order  to 
count  all  ot  the  pets  belonging  to  each  source  DAD  record.  It  is  not 
sufficient  to  generate  only  a  single  loop  for  the  PETS  group. 


]n  summary,  the  conversion  logic  algorithms  must  accomplish  two 
tasxs.  The  first  is  to  generate  code  which  will  compute  the  correct 
temporary  array  subscript  tor  each  source  field  occurrence.  The  second 
is  to  generate  looping  statements  so  that  a  data  transformation  is 
executed  for  all  source  field  occurrences. 

2.3.3  The  Write  module  Design 

All  target  database  "writes"  are  accomplished  using  the  S2h  PLl 
statement  INSERT  <schema  name>.  The  semantics  of  the  INSERT  statement 
are  to  attach  the  <schema  name>  data  set  to  the  database,  positioning  it 
according  to  the  current  values  of  each  S2K  set  occurrence  pointer. 
Thus,  if  the  level  o  occurrence  pointer  equaled  3,  an  INSERT  on  a  level 
1  data  set  would  become  a  subordinate  set  of  the  third  occurrence  of  the 
level  o  data  set.  The  entire  write  logic  is,  therefore,  cased  on 
insuring  the  order  of  INSERT  comimands  is  correct.  Using  the  DAD, 
Children,  PETS  data  structure  as  a  target  database,  an  INSEPT  tor  the 
first  uaO  is  followed  by  an  INSERT  for  the  first  Child  which  is  followed 
py  as  many  INSERTS  as  there  are  PETS  belonging  to  the  first  CHILD.  Then 
tne  next  CHILD  INSERT  is  issued,  followed  again  by  as  many  INSERTS  as 
there  are  PETS  belonging  to  the  second  child.  This  order  is  continued 
until  all  CHILDREN  tor  the  first  DAD  have  been  inserted.  Ihen  tne  order 
repeats,  startino  with  an  INSERT  for  the  second  DAD,  etc. 
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2.4  Execution  Section  Design 

Tne  source  to  S2K  Conversion  System  generates  a  complete, 
selt-contained  FORTRAN  program  which  will  perform  the  entire  conversion 
joD.  Inis  is  in  contrast  to  generating  unique  conversion  procedures  and 
then  calling  them  when  needed,  as  done  by  the  EXPRESS  system.  In  order 
to  support  the  execution  phase,  the  Source  to  S2K  System  generates  two 
files  containing  li T - 2 0  control  commands.  One  tile  is  needed  to  support 
generation  of  the  FORTRAN  program  and  the  second  to  control  its 
execution.  Because  several  users  may  oe  usinq  the  system,  uniaue  names 
tor  the  generated  programs  and  the  command  files  must  be  assigned.  The 
rules  tor  these  names  are  contained  in  Section  3.2  ot  the  User's  Manual, 
in  order  to  generate  these  unique  files,  as  well  as  simplify  the  user's 
input,  a  single,  fixed  command  tile  was  designed.  This  file,  named 
"GENRA1E",  is  called  py  the  user.  It  will  take  the  user's  description 
input  and  execute  the  Source  to  S2K  Conversion  System  (see  figure  7). 
Here  the  FORTRAN  program  and  the  two  command  files  are  generated.  Next, 
tile  GENRATF.  calls  the  first  command  file  lust  generated.  The  commands 
in  this  file  will  sort  the  generated  FORTRAN  program,  (see  section 
2.S.2),  compile  it,  change  the  program's  name  to  a  unigue  name  for  that 
user,  and  save  it.  when  the  user  is  ready  to  execute  the  conversion 
job,  the  second  generated  command  file  is  called.  This  file  will 
compile  the  FORTRAN  program,  ready  the  source  input,  target  database  and 
S2K  software,  execute  the  program  and  save  all  files.  Details  and 
examples  of  executing  the  system  are  contained  in  Section  3  of  tne 
user's  Manual.  The  command  statements  for  the  tile  GENRATE,  and  an 
example  of  the  two  generated  tiles,  is  contained  in  Appendix  h. 
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Figure  7.  Files  and  tastes  involved  in  each  Source  to  S2K 
Conversion  job. 


2.5  implementation  Details 
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Details  of  the  system's  design  and  maior  algorithms  have  already 
been  presented.  This  section  will  present  details  on  the  software 
implementation  of  these  algorithms.  Specifically,  the  implementation 
pnilosophy,  basic  program  structure,  maior  data  structures,  and  an 
itemization  of  each  procedure,  its  function  and  inputs/outputs  are 
presentee. 

2.5.1  implementation  Philosophy 

The  programming  philosophy  useo  for  this  implementation  is  a 
result  of  the  author's  12  years  programming  experience  and  recent 
graduate  work  in  Programming  Methodology.  while  it  is  beyond  the  scope 
of  this  report  to  document  the  entire  philosophy,  it  may  be  of  interest 
to  highlight  certain  programming  concepts  and  techniaues  used  to 
implement  the  source  to  S2K  conversion  system.  The  programming  concepts 
aiscussed  can  be  thought  of  as  guidelines  to  "good"  program 
construction.  The  program  techniques  are  specific  rules  and  procedures 
*nlcn  complement  the  concepts  ana  help  realize  the  program  construction. 

2.5. 1.1  Programming  Concepts 

Reliable  programming  is  not  an  easy  task.  Most  systems  are  very 
large  and  very  complex.  Due  to  this  size,  tew  proqrams  can  be 
completely  tested  where  all  possible  input  and  output  states  are 
examined.  In  light  of  these  facts,  it  is  believed  reliable  programs 
must  be  constructed  in  a  disciplined  and  systematic  manner.  This  is  the 
first  and  most  Important  programming  concept.  The  second  concept  is  to 
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construct  programs  using  a  hierarchy  of  abs tract i ons .  This  means  to 
suppress  the  details  of  a  function  to  the  lowest  level  possible.  rne 
purpose  of  this  is  to  improve  both  the  clarity  and  understandability  of 
the  program,  Dijkstra  states,  "The  purpose  of  abstraction  is  not  to  oe 
vague  but  to  create  a  new  semantic  level  in  which  one  can  be  absolutely 
precise"  1 4 J .  How  to  recognize  when  a  new  semantic  level  is  desirable, 
as  well  as  the  total  organization  of  the  program,  should  be  guided  by 
specific  reasonina  rather  than  intuition.  The  third  concept,  therefore, 
is  to  use  Harnas'  work  [14]  in  program  module  decomposition  as  a 
criteria  tor  determining  the  modules  of  a  program,  briefly,  this 
ci iter ia  includes  : 


1.  t'.mphas ize  the  interface  between  the  modules  rather  then  the 
traditional  functional  modularization. 

i.  From  a  given  set  of  requirements,  select  the  set  of  assumptions 
that  are  likely  to  change.  Design  modules  around  these 
assumptions  and  "hide"  them  in  the  module.  Then  select  the 

assumptions  that  are  unlikely  to  chanqe  and  desiqn  the  module 
interfaces  around  them. 


lhe  final  concept  Is  that  of  developing  the  program  using  a 
"stepwise  refinement"  approach,  as  introduced  by  wirth  120J.  Stepwise 
refinement  means  that  program  construction  should  be  viewed  as  a 
sequence  of  refinement  steps.  In  each  step  a  task  is  broken  ud  into 
several  suotasks.  As  the  descriptions  of  each  subtask  are  retinea,  so 
should  the  data  structures  used  to  support  the  tasks.  Thus,  the  program 
ana  supportina  data  structures  are  developed  in  parallel.  The  important 
aspect  of  this  concept  is  to  recognize  the  oossiblity  of  improving  an 
algorithm  when  the  data  structures  are  refined.  Rather  than  design  the 
data  structures  separately,  they  should  be  designed  using  the  same 


I 


Jo 

hierarchical  process  as  that  used  to  design  the  algorithm's  logic. 

2.b.l.2  Programming  Technigues 

Trie  four  concepts  discussed,  constructing  the  program  in  a 
disciplineg,  systematic  manner,  using  a  hierarchical  level  of 
abstractions ,  decompose  the  program  modules  based  on  their  interfaces, 
and  oevelon  the  program  and  data  structures  using  a  stepwise  refinement 
approach,  must  have  specific  programming  techniques  to  support  them, 
lhe  first  technioue  used  was  to  ensure  all  requirement  specifications 
were  completed,  reviewed,  and  accurate  before  any  system  design  work 
commenced.  The  emphasis  here  was  to  study  the  elements  in  the  system 
which  were  likely  to  change  and  those  which  were  likely  to  remain 
stable.  from  this  study  a  better  decomposition  of  the  system  could  oe 
made  during  the  design  phase. 

dtructuren  flowcharts  were  used  to  design  the  entire  system 
before  coding  began.  Figure  fc  is  an  example  of  a  structureg  flowcnart 
for  an  algorithm  to  merge  two  sorted  arrays  into  a  single  array.  These 
charts  encourage  a  structured  organization  ana  the  use  of  levels  of 
abstraction.  The  only  programming  constructs  ailoweo  are  the 
assignment,  i  f  . . then . .  e  1  se  ,  and  case  statements,  procedure  calls,  and 
"while"  loons.  The  "while"  looping  invariant  is  clearly  stated  at  the 
top  of  each  loop.  Using  this  limited  number  of  primitives  encourages 
slmplier  programming,  fewer  "tricks"  and  no  GUTli  statements. 

The  thira  technique  used  was  to  select  a  programming  language 
suited  to  the  philosophy.  PASCAL  was  selected  because  of  its  dIock 
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structured  flowchart  for  merging  two  sorteo  arrays 
(A  and  B )  into  a  single  array  (C). 
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structure,  its  control  mechanisms  (KOR,  WHILt,  and  RfcPfcAT.  .UN'I  IL 
statements!  and  its  capability  to  detine  heter oaeneous  data  structures. 
PASCAl's  weaknesses,  string  manipulation  and  input/output ,  causea  some 
problems  in  parsing  the  user  input.  However,  the  capability  to  detine 
elaborate  symbol  tables  (see  section  2.5.3)  compensated  for  these 
weaknesses . 


Tne  tinal  technique  was  to  use  the  work  in  program  verification 
(see  t loya  15],  hoare  [8],  and  Yeh  [21])  as  a  guide  towards  writing 
correct  code.  Individual  procedures  in  the  Source  to  S2K  conversion 
system  were  not  formally  proven.  However,  each  procedure  was  written 
with  certain  verification  rules  and  steps  in  mind.  These  steps  were: 


1.  txamine  the  algorithm/task  to  be  programmed  ano  find  the  loop. 

2.  It  there  is  a  loop,  establish  a  loop  invariant.  This  invariant 

is  that  condition  (B)  which  remains  true  throughout  the 

processing  of  the  loop,  and  becomes  false  when  the  loop 

terminates . 

3.  Initialize  all  variables  before  enterina  the  loop,  insuring  the 
invariant  (h)  remains  true.  If  it  does  not,  the  initialization 
is  in  error  or  the  invariant  is  not  correct. 

4.  tnsure  the  code  contained  within  the  loop  approaches  the 

condition  NOT  (B).  This  step  is  taken  to  guarantee  the  loop 
will  terminate. 


The  above  steps,  as  well  as  the  other  techniques  discussed, 
gives  credence  to  tne  programming  philosophy  presented.  These 
techniques  were  not  just  investigated,  but  faithfully  used.  As  a 
result,  the  Source  to  S2K  Conversion  System  implementation  is  well 
structured,  can  accommodate  modification  and  is  believed  to  be  correct. 


2.5. i  Basic  Program  Structure 

Figure  9  Is  a  structured  flowchart  of  the  Source  to  S2K 
Conversion  System  program.  The  PARSE  module  reads  all  of  the  user 
input/  checks  for  syntax  errors  and  builds  the  symbol  tables.  If  an 
error  is  found  in  the  user  input,  no  FORTRAN  program,  is  generated.  It 
no  errors  are  found  and  the  user  "asked"  for  program  generation,  a 
series  of  generation  modules  are  called,  as  shown  in  figure  9.  Data 
neeaed  to  generate  the  FURTPAN  statements  are  in  the  symbol  tables  and 
other  variables  which  were  filled  by  the  PARSE  module.  A  line  of  code 
is  generated  and  then  written  to  the  program  output  file  FURTSRC, 
(FORTRAN  source).  Several  situations  arose  where  a  line  of  code  needed 
to  ue  generated  immediately,  but  its  subsequent  write  put  it  out  of 
oner.  Examples  of  this  are  generating  a  subroutine  before  all  of  tne 
".main"  code  is  complete,  and  generating  the  beginning  and  ending  of  a  1)0 
loop  before  generating  the  code  contained  in  the  loop.  To  solve  this 
problem,  each  generated  line  of  code  is  written  with  a  leading  tag. 
inis  tag  represents  the  logical  position  of  the  generated  line.  When 
the  program  is  completely  generated,  a  sort  utility  is  called  to  sort 
the  program  in  proper  tag  sequence.  This  moves  all  generated  lines  of 
core  to  their  proper  location  in  the  program. 

All  common  parsing  and  generation  functions  were  written  outsioe 
of  tne  main  structure  of  tne  program  and  made  global  to  the  procedures 
needing  them.  These  functions  include  searching  tor  a  particular 
character  in  an  input  string,  putting  a  series  of  characters  into  an 
output  line  buffer,  generation  of  leading  tags,  and  all  input  and 
output.  This  allowed  placing  the  details  of  many  functions  in  a  single 
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location,  niding  their*  from  the  main  logic  of  the  program,  other  program 
details  may  be  found  In  the  itemized  Procedure  Listing  (section  2.5.4). 

2.5. J  Data  Structures  Used  in  the  Program 

A  number  of  global  data  structures  were  declared  In  order  to 
facilitate  communi cations  between  the  PARSE  and  GENERATION  routines. 
These  global  structures  may  be  considered  the  symbol  tables  of  the 
system.  The  parse  routines  "fill"  the  symbol  tables  with  data  received 
from  the  user.  The  generation  routines  then  "read"  this  data,  thereby 
generating  a  unioue  PL1  FORTRAN  program  tor  the  user.  This  section  will 
describe  the  layout  and  meaning  of  the  important  global  data  structures. 

The  Source  to  S2K  Conversion  program  declared  three  primary  data 
structures:  the  Source,  the  Target  and  the  Conversion  Symbol  Tables. 

Several  other  sinale  word  arrays  were  globally  declared,  but  their  usage 
and  description  is  evident  upon  examination  of  the  program  listing. 
Because  PASCAL'S  declaration  notion  is  exceptionably  concise  and 
readable,  the  actual  PASCAL  declarations  for  each  symbol  table  will  be 
presented,  with  explanation. 

Itlfi  Sour  re  UfiCldfidtlOfil 

TYPE 

SRCTBLT YPE  =  RECORD 

GPPNUM  :  1 . .maxgroups; 

PARENTINDEX  :  1  . . MAXSRC ; 

LFIGPOSITION  :  1 . .MAXRECSIZE; 

EDI TSPEC  :  char; 

DEC 1MALSPEC  :  INTEGER; 

ELDSIZE  :  integer; 

REPEATARRAY  :  ARRAYL  1 .  .MAXRE PEATS  )  UF  REPTYPE; 

end; 


42 


B 


REPTYPE  =  PfCORD 

REPNUM  :  INTEGER; 

REPINCREMENT  :  INTEGER; 
EM' ; 


VAR 

SUURCETARLE  :  ARRAY [  1..MAXSRC  ]  OF  SRCTBLTYPE; 


Tne  meaning  of  the  declared  fields  are: 


I 

» 

t 

» 

I 

» 

ft 

ft 

ft 
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SRCTBLTYPE  -  The  name  of  the  record  type  description  tor  the 
Source  Symool  Table. 

GRPl.UM  -  The  group  numper  of  the  group  the  field  resides  in. 

PARENT INDEX  -  The  Symbol  Table  inaex  number  of  the  field's  group's 


ORIGPUSITIGN- 

ED IT  SPEC 
EEC 1MALSPEC  - 
FLDSIZE 
REPEAT ARRAY  - 

REPTYPE 

REPNLM 

REPIM 

SUURCEIHL 


parent . 

The  index  number  of  the  first  occurrence  of  this  field 
in  the  temporary  array  SRC. 

The  field's  input  edit  specification. 

The  decimal  specification  tor  a  type  REAL. 

The  size  of  the  field. 

Repeat  array  data  tor  the  level  the  field  is  at  and 
all  levels  above  it. 

The  name  of  the  type  description  for  the  repeat  array. 
The  number  of  times  this  group  repeats. 

Total  size  of  the  group. 

The  name  of  the  Source  Symbol  Table  with  the  layout 
as  described  in  the  type  description  SRCTBLTYPE. 


i* 

t 


t 


lue  Idxa&t  swmal  latilfi  iiecia utiaat 

TYPE 

1  ART  bLT  YPE  =  RECORD 

NAME  :  ALFA; 
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targettype 

SIZE 

REPEATED 
FM  J 


CHAR; 

integer; 

BOOLEAN; 


VAR 

TARGETTBL  J  ARRAY l  1..MAXTARGET  )  OF  TARTBLTYPE; 


The  meanings  of  the  declared  fields  are: 


1 ARibLTYPE 

name 

TARGETTYPL 

SIZt 

REPEATED 

TARGETTBL 


-  The  name  of  the  type  description  of  the  Target 
Symbol  Table. 

-  The  component  name  for  this  field,  or  the  data  set 
name  if  the  entry  is  a  repeating  group  header. 

-  The  field's  type  (i.e.  integer,  real  or  character). 

-  The  size  of  the  field  in  10  character  woras.  (Type 
real  and  integer  always  are  SIZE  =  1  word). 

-  A  flag  to  indicate  whether  the  field  is  contained  in 
a  repeating  group. 

-  The  name  of  the  Target  Symbol  Table  as  described  by 
the  type  description  TARTBLTYPE. 


IL£  LociraxsiDP  Snob ul  labia  Qaclaxailaai. 

TYPE 

CONVTBLTYPE=  RECORD 


CUNVTYPE 

:  char; 

SRCNUM 

:  0. . MAXSRC ; 

1 RGNUM 

:  ALFA; 

T EMPTYPE 

:  CHAR; 

OPER 

:  0 . .  5 ; 

PISC 

:  ALFA; 

END; 

VAR 

CUNVTBL  :  ARRAY (  1 . . MAXCON V  )  OF  CON VTBLTYPE ; 


The  meaning  ot  the  declared  fields  are 


CONV  TbLTYPE 


CUN VTYPE 


SRCNUM 


TRGNUM 


TEMP TYPE 


OPER 


wise 


CUNV iriL 


The  type  description  for  the  Conversion  Symbol 
Table. 

The  type  of  conversion  statement  this  entry  is, 

(i.e.,  a  DIRECT,  REPEAT ,  LEVELUP,  etc.). 

The  Source  component  tor  the  conversion  transformation. 
If  SPCNUM  =  0  the  source  is  in  the  temporary  variable. 
The  S2K  component  number  for  this  conversion  trans- 
tornation.  if  CONVTYPE  =  STORE,  this  is  the  Repeating 
Group  name  to  store. 

if  spcnum  =  0,  this  indicates  which  temporary  variable 
holds  the  source  value,  (i.e.  TEMPREAL,  TEMPINT  or 
TEMFCHAF ) . 

If  CONVTYPE  =  LEVELUP,  UPE'RrO  if  the  term  "LAST"  is 
input.  If  CONVTYPE  =  UPOP,  OPER  indicates  the  oper¬ 
ation  to  apply,  (i.e.  1=MAX,  2=MIN,  3=AVG,  4=COUNT, 

5  =TOT AL) . 

It  CONVTYPE  =  LEVELUP,  MISC  holds  the  specific  oc¬ 
currence  numbers  tor  each  repeating  group.  If 
CONVTYPE  =  DIRECT,  MISC  holds  the  optional  constant 
to  be  moved. 

The  name  of  the  Conversion  Symbol  Table  as  described 
ny  the  type  description  CON VTBLTYPE . 


2.5.4  itemized  list  of  Program's  Procedures 


As  detailed  in  Section  2.5.2,  the  Source  to  S2K  program 
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diviaea  into  two  irain  modules,  (the  PARSE  and  the  GENERATION),  with  the 
global  symcol  tables  providing  the  communication  between  the  two.  This 
section  will  discuss  each  module  separately,  presenting  a  "top-down" 
view  ot  the  procedures,  and  give  some  details  of  the  most  important 
ones . 


> 
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I 
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t 


Er.oqi.am.ls  lac  L&u£l  uacumeataLiaai 

SRCTUS2K  Program  (Source  to  S2K ) 

(♦  Main  Procedures  *) 

Ihl TGLOeALS 
PARSE 

generateale 

(*  Utility  Procedures  Global  to  All  *) 

Cun  V  TOC  HR  (Convert  to  Character) 

CONVtoint  (Convert  to  integer) 

lUUGUJUAUi 

FUNCTION  --  All  global  data  structures  are  initialized. 

INPUTS/ OUTPUTS  --  No  formal  parameters.  Global  data  structures  needing 
initialization  are  altered. 

EAEiat. 

FUNCTION  --  To  read  the  user  input,  ensure  correct  syntax  and  semantics, 
(where  possible)  and  fill  the  symbol  tables. 

INPU1S/0UTPUTS  --  The  user's  input  descriptions  are  the  input.  The 
output  are  tilled  symbol  tables. 


fiat LHA1EALL 


FUNCTION  --  To  generate  the  PLI  FORTRAN  program  and  control  command 
tiles. 

IN PUTS/ OUTPUTS  -•  The  input  are  the  tilled  symbol  tables.  The  output 
are  the  text  tiles  containing  the  FORTRAN  proqram  and  control  commands. 

CU&UUUCUE 

FUNC II UN  --  To  corvert  the  inputted  integer  to  its  character  form. 

INPU fS/OUTPUTS  -•  Formal  parameter  M1NT"  is  the  input  integer  to  be 
converted  to  character  form.  The  global  word  "token"  is  the  output  word 
where  the  character  form  of  the  integer  is  stored. 

CULU.ILLUX 

FUNCTION  --  To  convert  the  inputted  character  word  to  its  inteqer  value. 
CQ.vviuiNT  is  declared  a  type  integer  function. 

INPUTS/OUTPUTS  --  The  formal  parameter  is  a  10  character  word.  The 
output  is  the  function  name  itself,  where  the  binary  value  of  the 
inputted  character  form  is  stored. 


Ibe  ££A££4u££  UanumeataXiacT. 

PARSE 


(♦  Main  Procedures  ♦) 

COMMANDMODE 

oOUPCEMODF 


» 


S2KMUDE 

CUUWKRilUNMODt 
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C*  Utility  Procedures  Global  to  PARSE.  *) 

HEADAC AKD 

GLTTUKEN 

SKIPBLK 

EgUALUR 

CriKPERIUD 

ERROR 


cummAnumhiir-  (Command  Mode) 

FUNCTION  --  The  user's  command  language  inputs  are  parsed  ana 
appropriate  global  data  structures  are  filled  with  data. 

INPUTS/QU TPUTS  --  The  input  is  the  user's  description  input  file.  The 
output  are  globally  declared  single  word  arrays  filled  with  data  the 
generation  module  will  use. 

£ UJlkCLttUUE  (Source  Mode) 

FUNCTION  --  The  user's  source  input  description  is  read  and  the  source 
symbol  taole  is  build. 

IRPUIS/UUIPUTS  --  The  input  is  the  user's  source  file  description.  The 
output  is  the  completed  source  symbol  table. 

s?KMudfc-  (System  2000  Mode) 

FUNCTION  —  The  user's  S2K  target  file  description  input  is  parsed  here. 
The  target  symbol  table  is  also  build. 

INPUTS/OUTPU l s  --  The  input  is  the  user's  target  tile  description.  The 
output  is  the  completed  target  symbol  table. 


t 


CUlLiLtd&iiitUUiUt.  (Conversion  Mooe) 

FUNCTION  --  1  he  user’s  conversion  statements  are  parsed  ana  tne 
conversion  symbol  table  build. 

1 npu 15/uU TPUT S  --  The  input  is  the  user's  conversion  statements.  The 
output  is  the  completed  conversion  symbol  table. 

kLAilAClkU  (Read  A  Card) 

Function  -«*  To  reao  a  record  trom  an  input  file  and  place  the  record  in 
a  temporary  text  file.  This  text  file  will  act  as  an  input  recoro 
buffer*  80  characters  Iona. 

INPUTS/OUTPUTS  --  Input  formal  parameter  "Fname"  is  the  text  file  to  be 
reaa.  The  output  is  the  text  record  buffer  "LINE”. 

£i.XJ (Uet  Token) 

FUNCTION  --  The  function  is  to  scan  a  line  of  text  ana  return  the  next 
"token".  A  token  is  defined  as  a  strinq  of  alphanumeric  characters 
seoaratea  by  aelimiters,  where  a  delimiter  is  any  non-alphanumer ic 
char acter . 

inputs/outputs  --  input  formal  parameter  "fname"  is  a  text  tile.  me 
file  is  80  characters  long  and  represents  a  single  input  character 
buffer.  Output  is  the  global  word  "TOKEN"  where  the  found  token  is 
stored  left  justified,  blank  tilled. 


SLitaWi  (Skip  blanks) 
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FUNCTION  --  The  function  is  to  pass  over  all  consecutive  blank 
characters  in  the  input  record  buffer  "LINE",  starting  with  the  current 
character  pointer. 

INPU'l S/OUIP UTS  The  input  is  the  current  character  position  of  text 

tile  "line".  Ihe  output  is  the  new  character  position  of  "LINE.". 

ELUALUL  (Equal  Ok?) 

F  UNCTION  --  The  function  is  to  syntax  check  the  Presence  of  an  equal 
sign,  used  in  many  of  the  user  input  statements.  If  the  equal  sign  is 
not  present,  the  proper  error  message  is  given. 

lh PUTS/OUTPUTS  --  The  input  is  the  current  position  of  the  text  tile 
"LINE".  ihe  output  is  the  new  position  of  "LINE".  If  the  check  for  the 
equal  sign  tails,  the  output  also  includes  an  error  message. 

Lu.tiitHi.UL  (Check  Period) 

EULC11UN  --  The  function  and  inputs/outputs  for  CHKPEHIOD  are  the  same 
as  those  for  ECLALOK.  The  difference  in  the  two  procedures  is  that 
CHKPtMOD  checks  tor  the  presence  of  a  period  (.). 

LLHuu 

FUNCTION  --  The  FUNCTION  is  to  write  the  error  message  passed  it  and  set 
the  global  error  flag  "EHROP FOUND" . 

1NP0TL/UUTPUTS  --  The  input  is  an  error  messaqe  constant,  bO  characters 
long.  The  ouput  is  the  error  message  being  written  to  the  printer  ana 


the  boolean  "F.PPUH- UUNDM  being  set  to  true. 


Ine  GLML)iAILA.LL  Xxcceaute  uacu£&.utdLia&x 


GENbPAl  bAL.L 

(*  Main  Procedures  *) 

ObivHbADbP 
nbi.SC  H  t.MAS 
GbMNlTiAL. 

UbNUPfc.h 

GKNPbAD 

GbNCUNVLPSlOf. 

UbNCLuSb 
GbhbPPUPSUBS 
GbNOiDF  ILE S 

(*  Utility  Proceoures  (Global  to  GLNLRATEJALL  *) 
*M1AG 
*EUKD 
»H  INTEGKP 
*vP  11  bCUNl 


GLLMtAULii  (Generate  Header) 

KUNCilUN  --  This  procedure  qenerates  the  heading  of  the  PLI  FOK'IHA.n 
program. 

INPUTB/UUTPUTS  --  The  Inputs  are  globally  defined  single  noro  arrays 
containing  data  about  the  source  inputs  file.  The  output  is  the  EliPTBAN 
program  statement,  *hich  includes  the  declaration  of  external  files,  and 
a  program  comment  explaining  the  purpose  of  the  program. 

GL GGGMLMAG  (Generate  schema  Dec lar at i ons ) 


F  UNCI  1UN 


This  procedure  qenerates  the  common  block  declarations  tor 


the  necessary  System  2000  communication  areas.  One  common  clock 
declaration  is  required  for  each  declared  database  repeating  grout. 

1 NPUTS/uU tputs  The  input  data  is  the  target  syrnooi  taDle.  The  output 
is  the  generated  common  block  declarations. 

(Generate  Initialization  Code) 

KUNCilON  --  This  procedure  performs  six  generations:  1.  generation  of 
real  declarations  for  target  database  components  of  type  REAL;  2. 
Generation  of  the  glocal  arrays  "BUF"  and  "SRC";  3.  generation  of  a 
parameter  declaration  for  the  "EMPTY"  character;  4.  generation  of 
E u R M A T  statements  used  in  the  FORTRAN  proaram;  5.  generation  of 
initialization  code  for  the  global  buffers,  ana  b.  generation  of  a 
print  header  for  the  FORTRAN  program. 

I ?4p(J Is/OUTPUTS  --  The  corresponding  inputs  tor  the  six  generations  are: 
1.  tne  target  symbol  table;  2.  the  source  symbol  table;  3.  the 
globally  defined  »ora  "EMPTY";  4.  no  input--a  constant;  b.  source 
sympol  tacle,  arc  b,  globally  defined  words  "DBNAME",  and  "FILENAME", 
me  outputs  are  the  generation  of  the  code,  as  describee  above. 

(Generate  Latabase  open  Code) 

FUNCIIUN  --  This  procedure  generates  the  code  for  opening  the  target  S2k 
database.  All  of  this  code  is  the  same  for  each  job,  with  only  the 
database  namie  and  password  being  different. 

I NPU  IS/uUTPUTS  --  me  inputs  are  the  words  "DBNAME",  and  "USERID", 
containing  the  database  name  and  the  password  respectively.  The  output 


is  tne  generated  code,  as  described  above. 

Cfr.ivRi.ALi  (Generate  head  Code) 

t  UNC'i  1(jN  --  This  procedure  generates  the  unformatted  read  statements, 
the  decode  statements  and  the  formiat  statements  used  by  the  decode 
statements.  men  generating  this  code,  each  decode  statement  must 
aecoae  less  than  150  characters  of  the  input  record,  ano  must  terminate 
each  oecode  statement  on  an  even  10  character  word  boundary. 

INPuXs/uuTPUTS  --  The  input  is  completely  contained  in  the  source  symbol 
table.  The  output  is  tne  generated  code,  as  described  above. 

CP.MCutiiiPkRintv  (Generate  Converson  Code) 

Kif.'CllUN  --  fcach  entry  of  the  conversion  symbol  table  is  read  and  code 
is  generated  tor  it.  A  procedure  tor  each  conversion  statement  is 
declared  within  Gt  NCCiN  V  fr.PS  1  ON ,  and  is  called  depending  on  the  statement 
type  ot  tne  conversion  symool  table  entry. 

mPUls/oUTPUTS  --  ine  inputs  are  tne  conversion  symbol  table  and  the 
source  symbol  table.  The  source  sympol  table  is  needed  to  determine 
whlcn  array  word  ot  "SPC"  the  source  value  resides.  The  output  is  the 
code  to  move  the  source  fields  Into  their  respective  target  fields,  to 
store  the  data  sets  using  the  s^K  IhSfrPT  command,  and  generate  PORTBAN 
Uli  Luo PS  wnlcn  will  properly  iterate  through  the  entire  source  tile. 

Pfrt^CLLiSfc  (Generate  Database  Close  Code) 

HJhCUON  --  This  procedure  produces  the  code  to  close  the  database  and 
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produce  a  summary  print  out  o t  the  job. 

INPUTS/OUTPUIS  --  inis  cone  generation  is  the  same  tor  all  job.  The 
differences  are  the  database  name  and  the  print  header  generated.  Tne 
input,  therefore,  is  the  word  "DBNAME",  and  the  output  the  generated 
code,  as  described  above. 

HKMiLfibQRsnhs  (Generate  Error  Subroutine  code) 

FUNCTION  --  This  procedure  generates  the  code  for  the  error  subroutine. 
This  subroutine  is  called  in  the  FORTRAN  proaram  tor  all  s 2 K  dataoase 
errors  detected  curing  execution.  Because  this  is  a  standard 
subroutine,  the  same  code  is  generated  tor  all  jobs. 

SLMLMUfclLtS  (Generate  Command  Files) 

FUNCTION  --  This  procedure  oenerates  the  tiles  "C'OMPILC"  and  "EXECC”. 
These  files  contain  the  unique  UT-2P  control  commands  necessary  for  the 
user  to  generate  the  FORTRAN  program  (tile  "C0MP1LC")  and  execute  it 
(tile  "EXECC").  Details  on  these  processes  are  contained  in  the  user's 
Manual . 

INPUTS/OUTPUTS  --  The  inputs  are  many  globally  declared  words  containing 
information  on  the  source  input  file,  the  target  database  tile  name  and 
location,  and  execution  options  avaiiibie  to  the  user.  This  data  is 
gained  primarily  from  the  user's  Command  Language  input  and  the  first 
portion  ot  the  user's  Source  Language  input.  The  output  are  the  two 
control  command  files.  A  complete  example  of  these  tiles  can  be  found 


in  Appendix  B 


m&IXAL  (write  Tag) 


FUNCTION  --  This  procedure  writes  the  leading  tag  tor  each  qenerateo 
line  of  FOP) FAN  code.  This  tag  is  later  used  to  sort  on  In  order  to 
rearrange  the  generated  FORTRAN  code  in  proper  sequence. 

INpUTS/uUIPUTS  --  The  input  is  the  formal  parameter  " 1 N T A G "  which 
contains  an  integer  value  to  output.  The  output  is  a  10  character 
integer  written  to  the  file  "FURTSRC",  right  lustified,  zero  filled. 

&EX  (write  String) 

FUNdlUN  --  This  procedure  writes  the  strina  passed  to  it,  to  the 
current  line  of  tile  "FOFTSPC".  FOPTSRC  is  the  file  containing  the 
generated  FORTRAN  program. 

INPUTS/uUTPUTS  --  The  input  is  the  formal  parameter  "String".  This 
string  is  a  SO  character  constant  which  will  be  written,  either  in  full 
or  until  the  first  percent  (%)  is  found.  The  output  is  the  constant 
string  being  written  to  the  tile  "FORTSRC". 

*nT«kt)  (write  worn) 

FUtiCIluN  --  This  procedure  performs  the  same  function  as  wrt,  except  it 
will  write  a  word  to  the  file  "FORTSRC"'  instead  of  a  string. 

INPUTS/uUTRUT s  --  The  input  is  the  formal  parameter  "WORD"  which  is  a  10 
character  word.  The  output  is  writing  this  entire  word  to  the  tile 
"FOR1SHC"  or  until  the  first  blank  is  encountered. 


fcHT-it.Tivr.Kh  (write  Integer) 


FUNCTION  --  This  procedure  performs  the  same  function  as  WRT,  but  it 
write  an  integer,  in  character  form,  to  the  file  FOPTS&C. 

LNPU TS/OUTPUTS  --  The  input  is  the  formal  parameter  "1M"  which  is  an 
integer.  This  integer  will  be  converted  to  its  character  form,  tnen 
written  to  FORTSRC.  Only  as  many  characters  the  integer  is  long  will  De 
wr itten. 


mgl-ik-f nfci'  iwrite  Continue) 

FUNCTION  --  This  procedure  will  generate  a  FORTRAN  CONTINUE  statement 
with  the  tag  and  lapel  as  input  in  the  parameters. 

INPUTS/OUTPUTS  -•  The  inputs  are  the  formal  parameters  "INTAG"  ann 
"FnpttATNUM".  The  output  is  a  FORTRAN  CONTINUE  statement  with  the  tag  of 
1 n tag  and  the  lapel  ot  FORMATNUM.  This  statement  is  used  as  a  label  to 
indicate  the  end  ot  a  00  LOOP. 


2.6  Final  Comments 

This  report  concentrated  on  a  specific  subset  of  the  aata 
conversion  requirement.  This  subset  was  converting  a  source  file  to  a 
defined  target  database.  The  source  tile  must  be  hierarchically 
describaple  and  the  target  database  must  be  defined  and  maintained  by 
tne  bystem  2000  L)  R  R  s .  Although  the  solution  to  this  subset  is  a  system 
limited  only  to  the  specific  requirement,  the  ideas,  system 
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architectur e ,  and  algorithms  presented  here  are  applicable  to  many  aata 
conversion  requirements.  The  idea  of  implementing  a  small,  simple 
system  to  satisfy  a  specific  conversion  requirement  gives  the  system  a 
better  chance  of  succeeding  and  Deing  used.  As  an  example,  the  user's 
manual  tor  Michigan's  Data  Translation  system  is  3bb  pages  long.  The 
user  may  be  able  to  write  a  unique  program  to  satisfy  his  conversion 
need  taster  than  learninq  a  system  as  large  as  Michigan's.  The  common 
architecture  of  data  conversion  systems  presented  in  this  paper  nay  be 
usea  as  a  basis  tor  design  of  any  new  conversion  system.  The  specific 
design  options  taken  oy  the  Source  to  S2K  System  appear  to  be  tne  pest 
choices,  however,  a  new  conversion  requirement  would,  obviously,  dictate 
the  best  choice  for  the  design  of  its  system,  finally,  tne  algorithms 
tor  implementing  the  data  trahsf ormations  are  simple  and  satisfactory 
tor  a  procedurally  oriented  system. 

The  transportability  of  tne  source  to  S2K  Conversion  system  is 
one  of  its  weakest  points.  The  system  was  written  in  PASCAL  ana 
generates  a  non-ASCll  standard  FORTRAN  program  and  job  control  commands 
executable  only  by  the  UT-2D  operating  system.  These  languages  were 
chosen  because  they  are  the  best  supported  languages  at  the 
implementation  site.  It  a  conversion  system  of  this  type  is  intended 
tor  more  than  one  organization  and/or  machine,  the  system  should  be 
written  in  COBOL  or  ALGOL  bo,  generate  a  COBOL  PLI  program,  and  generate 
job  control  commands  for  a  standard  operating  system,  COBOL  allows  tor 
easier  record  description  and  editing  than  FORTRAN.  ALGOL  b 0  does  not 
otter  any  advantages  over  PASCAL,  but  it  is  supported  on  more  machines 
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Tne  final  comment  to  be  made  is  on  the  philosophy  of  the  Source 
to  S2K  Conversion  system.  it  was  designed  with  the  phi losophy  of  aiding 
the  user  in  the  conversion  task  rather  than  completely  accomplishing  it. 
This  philosophy  has  several  advantages.  First,  the  implementation  is 
simplified  by  not  designing  for  unusually  complicated  conversion 
requirements  which  might  arise.  Second,  the  user's  input  is  reduced  and 
simplified  since  he  does  not  have  to  translate  some  complex  portion  of 
his  conversion  requirement  into  an  even  more  complex  user's  language. 


Finally,  the  conversion  execution  may  be  improved  when  the  user  is 
allowed  to  access  the  generated  program  prior  to  its  execution. 
Designing  and  implementaing  these  conversion  systems  with  the  idea  of 
helping  the  user  rather  than  ensuring  completeness  will  give  the  system 
a  better  chance  of  being  used.  The  source  to  S2K  Conversion  System  nas 
achieved  that  goal. 
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SOUPCE  TO  S2K  CONVERSION  SYSTEM  USERS  MANUAL 


SECTION  1  --  GENERAL  DESCRIPTION 

l.A  INTRODUCTION: 

The  Source  to  S2K  Conversion  System  gives  The  user  the 
capability  of  generating  a  complete  S2K  PLI  FORTRAN  program  which,  when 
executed,  will  loao  his  defined  S2K  database  with  his  described  source 
input.  The  UT-2D  control  commands  needed  to  execute  the  PLI  program  are 
also  automatically  generated,  when  the  user  wishes  to  perform  the 
actual  database  load,  a  single  card  input  is  all  that  is  needed.  This 
two  step  process,  generation  of  the  load  program  and  actual  execution, 
provides  the  user  with  added  flexibility.  Because  the  generated  program 
is  stored  as  a  permanent  file  and  is  accessible  to  the  user  before 
execution,  the  user  may  modify  the  generated  program  as  much  or  as 
little  as  he  desires.  For  example,  should  the  user  wish  to  generate 
only  a  "skeleton"  load  program  and  then  write  his  own  conversion 
routines,  this  can  be  done.  Ur  the  user  may  generate  the  complete 
program  and  then  optimize  heavily  used  routines  for  improved  efficiency. 
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Tne  intent  is  to  automatically  generate  all  source  code,  yet  give  the 
user  complete  control  of  the  final  program. 

l.B  SlSTk-M  CHARAC1FRISTICS: 

The  system  is  designed  to  convert  a  single  source  input  tile 
into  a  single  S2F  defined  database.  It  is  also  possible  to  append  new 
source  data  sets  to  an  existinq  database.  This  feature  is  discussed  in 
Section  l.D.  The  system  supports  a  tree  like  hierarchical  data  model. 
Thus,  all  input  data  must  be  in  a  form  that  can  be  described  in  a 
hierarchical  manner.  Since  S2K  is  the  target  database,  all  database 
terminology  will  be  consistent  with  that  used  in  the  S2K  documentation. 

l.C  SiSTLM  usagf:: 

In  order  to  generate  a  complete  PLI  load  program,  the  user  is 
required  to  make  several  Inputs.  First,  descriptions  of  the  source  and 
target  files  are  necessary.  The  source  file  is  described  using  the 
Source  Description  Language,  as  defined  in  section  2.C.  The  target 
database  is  described  using  the  same  input  as  that  used  when  the  target 
S2K  database  was  described  to  the  S2K  system.  Documentation  of  this 
input  is  contained  in  the  Basic  S2K  documentation  (see  Define  Module). 
Next,  a  procedurally  oriented  Conversion  Lanouaqe  is  used  to  define  the 
mappings  between  the  source  and  target  data  fields.  Since  no  data 
movement  is  automat ically  assumed,  each  target  field  must  have  at  least 
one  conversion  statement  describing  how  its  data  values  are  attained. 
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The  three  major  inputs,  the  source,  target,  and  conversion 
descriptions,  constitute  the  majority  of  the  user  required  input.  In 
addition,  a  Command  language  is  used  to  tell  the  system  general 
characteristics  about  the  job  (user  cede,  password,  etc).  All  of  these 
languages  are  fully  described,  with  examples ,  in  Section  2.  A 
comprehensive  example  of  an  entire  run  is  contained  in  Section  3. 

l.o  loading  data  to  existing  databases: 

uccasionally  an  initial  load  may  have  several  source  input 
tiles.  if  it  is  not  Practical  to  combine  these  files  into  a  single 
tile,  it  is  possible  to  generate  several  PL1  FORTRAN  conversion  programs 
which  will  initially  load  the  database  and  then  continue  to  append  data 
sets  to  the  existing  database.  A  boolean  expression  is  input  which  will 
identify  a  level  ('  data  set  (see  Section  2.B  --  TYPE  card).  It  the 
level  0  data  set  exists,  the  new  data  sets  win  be  appended  to  it.  If 
the  booloean  expression  is  not  satisfied,  (i.e.  the  level  0  data  set 
does  not  exist),  a  new  level  0  data  set  will  be  created  that  does 
satisfy  the  expression.  Then  the  input  data  sets  will  be  appended  to 
the  newly  created  level  0  data  set.  Note  the  restriction  that  the 
booloean  expression  identifies  a  leitfil  £  data  set.  This  means  that  the 
adding  of  data  sets  starts  at  level  1  (if  the  level  0  data  set  already 
exists),  or  level  0  (it  it  does  not  already  exist).  For  example, 
suppose  the  input  source  consists  of  two  files,  one  of  DEPARTMENT  data 
and  the  other  of  EMPLOYEE  data.  Suppose  the  target  database  desires 
level  0  data  sets  of  DEPT  data  and  level  1  data  sets  of  the  employees 
working  in  that  Department.  First,  the  DEPT  data  would  be  loaded, 
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followed  by  a  run  for  the  EMPL  data.  Each  employee  record  would  be  read 
and  a  search  would  be  made  for  the  department  he  works  in.  The  search 
is  expressed  by  a  boolean  expression,  such  as  C3  EQ  S3  where  C3  is 
the  S2K  component  number  tor  Department  Name,  and  S3  is  the  Employee 
file  component  num-ter  for  DEPT-wtJRKS-IN.  Obviously,  it  the  employee 
tile  did  not  have  a  field  specifying  which  department  he  worked  in,  it 
would  pe  impossible  to  realize  the  desired  target  database.  Additional 
restrictions  on  the  creation  of  the  boolean  expression  are  contained  in 


Section  2.b 


TYPE  card 
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SECIiON  2  --  LANGUAGE  DESCP  I PT  IONS 

2.  A  GENERAL  PULLS  and  RESTRICTIONS: 

1.  Ihere  are  4  reouired  language  description  modes.  Before  submitting 
input  to  any  mode,  input  the  following  card  (*  starts  in  col  l): 

**  <mode  narre>  . 

wnere  <iriOde  name)  =  CLimmand  (general  description) 

SOURCE  (source  input  tile  description) 

S2K  ( S2K  schema  description) 

CONVERSION  (source  to  target  mappings) 

2.  All  input  rust  oe  submitted  on  cards.  Each  language  statement  must 
be  terminated  with  a  period  (.),  and  contained  on  a  single  card, 
whenever  a  single  blank  is  syntactically  leqal,  any  number  of 
consecutive  blanks  are  also  legal. 

3.  The  formal  syntax  is  described  using  a  "railroad  track"  notation. 
Syntactically  legal  statements  are  derived  by  traveling  the  track  from 
left  to  right.  All  required  entries  are  in  bold  print.  Entries 
contained  in  brackets  ([  ))  indicate  there  are  several  options  and  to 
choose  one. 

4.  Any  error  found  in  any  description  will  prevent  generation  of  the 
EOPIPAN  proqram.  All  source  input  will,  however,  continue  to  be  checked 
for  syntax. 


1.  Ihis  notation  is  used  by  the  burrouahs  Corporation  in  their 
Programming  Language  Manuals.  Niklaus  Iwirth  also  uses  this  notation  in 
his  description  of  PASCAL,  referring  to  it  as  "syntax  diagrams"  Cl). 
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2.K  CUMMAND  description  language 

PURPuSt : 


The  Command  Description  Language  is  used  to  provide  general 
information  needeo  by  the  system.  The  only  card  required  is  the 
LOC All UN  card. 


LQCAllUiH  CaLdi 


The  UT-2D  system  reauires  the  S2K  database  to  be  stored  under  a 
specific  tile  narte  and  permanent  library  ID.  The  LOCATION  Card  is  used 
to  specify  these  inputs. 


SYNTAX: 


LUC AT  ion  =  ---< f i Iename>--/--<1 ibr .  id>--/--<password>----  . 


*nere  <fllename>  =  The  file  name  the  S2K  definition  is  stored  and 

and  the  S2K  database  itself  will  be  stored. 
<libr.  id>  =  The  permanent  library  id  number  where  the 
above  file  name  resides. 

<password>  =  The  password  tor  the  permanent  library  id. 


EXAMPLE: 


LOCATION  =  DUNSDb/9294/1234. 


The  following  optional  cards  may  be  input: 
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HUM  L*zai 


The  R UN  type  card  specifies  whether  to  generate  a  FORTRAN 
program  and  the  UT-20  control  commands,  or  check  for  syntax  only. 
Default  is  full  generation. 


SYMAX: 


»  S  I 

- RUN  - - 1  I - . 

f  F  I 


where  s  =  Syntax  only 

F  =  Full  aeneration  (default) 


EXAMPLE. : 


RUN  =  S. 


1121.  U£U££.dtlOD  LaxcU 

The  TYPE  generation  card  specifies  whether  the  generated  program 
is  an  initial  load  or  an  update  program.  if  it  is  an  update  program 
(see  Section  l.D),  a  correct  boolean  expression  must  be  included  which 
identities  a  level  0  data  set  the  new  data  is  to  be  attached  to.  It  the 
boolean  expression  is  not  satisfied,  a  new  level  0  data  set  is  created 
which  does  satisfy  the  expression.  If  more  than  one  data  set  satisfies 
the  boolean  expression,  the  first  data  set  found  will  be  used  to  attach 
tne  new  data  tc.  The  user  is  advised  to  select  a  boolean  expression 
which  will  uniouely  identify  the  desired  level  0  data  set.  This  will 
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preclude  erroneous  dataoase  construction  due  to  unknown  Input  tile 
record  order.  Since  the  boolean  expression  will  be  included  in  the 
EORTKAN  prooram  unaltered,  its  syntax  should  be  the  same  as  that 
aescribed  in  the  basic  S2K  Documentation,  (see  Procedural  Languaqe 
Fortran,  PLF  6.6).  The  following  additonal  restrictions  should  be 
followed! 

a. )  No  more  than  10  S2K  components  may  be  used  in  a  single  boolean 
expression. 

b. )  use  complete  Source  component  numbers  (see  Section  2.0  and  S2K 
component  numbers#  (no  component  names  allowed). 

c. )  All  S2K  components  must  be  in  the  level  0  data  set. 


SYNTAX : 


I  1  I 

- -  TYPE  =  ----I  |----  . 

I  u  <boolean  expr>  I 


*here  I  =  Initial  load  (default) 

U  =  update  load 

<boolean  expr>  =  a  boolean  expresxlon  which  identities 

a  level  0  data  set. 


EXAMPLE : 


TYPE  =  i. 

TYPE  =  U  C  2  ,E0.  10HPP0GFAMMLR. 


gjgagsggg! 


t 
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fcMElt  tlela  Ctuii.dci.eti 

This  card  is  used  to  identity  which  input  source  character  will 
siqmty  null,  or  empty  data.  The  default  empty  character  is  blanx. 

SYN'I  ax  : 

----  empty  - - <character> - . 

where  <character>  =  Any  leqal  CDC  character. 


EXAMPLE : 


EMPTY  =  %. 


COMMAND  LANGUAGE  EXAMPLE: 


♦♦  COMMAND. 

LOCATION  =  Dh/9b92/1234. 
PUN  =  F . 

TYPE  =  1. 

EMPTY  =  0. 


(enter  command  mode.) 

(the  S2K  tile  name  and  location) 

(generate  FORTRAN  proaram) 

(initial  load  run) 

(the  input  source  character  0  means  no  data) 


r 

I 

r 

c 

i' 

r 


c 
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2.C  SOURCE  FILE  DEFINITION  LANGUAGE 

PURPOSE : 

Tne  purpose  of  the  Source  Definition  Language  is  to  provide  a 
means  of  describing  the  loqical,  storage,  and  physical  characteristics 
of  the  incoming  file.  A  thorough  knowledge  of  the  incoming  source  tile 
is  necessary. 

GENERAL  RULES  and  RESTRICTIONS: 

1.  The  first  input  statement  must  be  **  SOURCE. 

2.  All  source  statements  must  De  on  a  sinqle  card,  one  per  card,  each 
ending  with  a  period.  All  input  after  the  period  is  treated  as  a 
comment , 

3.  No  variable  length  records  may  be  described. 

4.  if  the  source  tile  originated  on  the  UT  CDC  6400-6600  under  control 
of  the  operating  system  UT-2D ,  the  system  will  handle  the  file  without 
any  user  intervention.  If,  however,  the  file  originated  elsewhere,  the 
user  may  have  to  examine  the  generated  FORTRAN  READ  module  prior  to 
actual  execution  of  the  conversion.  This  is  because  the  UT-2D  file 
system  uses  uniaue  end-ot-line  and  end-of-file  markers.  Foreign  file 
formats  may  need  to  be  read  in  an  unorthodox  manner  to  get  the  proper 
results.  The  user  is  advised  to  get  the  "foreign  tile"  into  a 
compatible  CDC  ana  UT-2D  format. 
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SOOWCt  DEFINITION  LANGUAGE  STATEMENTS: 

The  stoiage  and  physical  characteristics  ot  the  tile  are  input 
first.  These  incluae  the  File  ID  and  the  Device  Type.  If  the  Device 
Type  is  TAPE,  several  additional  statements  must  be  input  describing  the 
tape's  characteristics. 


ELU.  14  Cdtcu 

The  FILF  io  card  is  used  to  indicate  the  name  of  the  input 
source  file  ana  its  permanent  library  id  or  local  tape  id.  if  the  tile 
is  a  tape,  input  its  tape  number  and  password  in  place  of  the  permanent 
Horary  id  and  password.  If  the  tile  has  no  name  (such  as  a  card  file) 
then  input  the  wore  NONE. 

svniax: 

— --  FILE  =  - — <f lie  name> — / — <libr.  id>--/ — <pass wpr d>---  . 

where  <tile  naire>  =  The  name  ot  the  input  source  file. 

<libr.  ld>  =  The  permanent  library  id  or  the  local  tape 
id  for  disk  and  tape  files. 

<password>  =  The  password  for  the  permanent  library  id  or 
local  tape  id. 

examples: 

FILE  =  DATA1/9294/1234.  (disk  file) 

FILE  =  NONE.  (card  deck) 

FILE  =  NONE/1234/555S.  (unlabeled  tape) 
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Tne  input  file  device  type  can  be  either  READER,  DISK,  or  TAPE. 
No  other  device  types  can  be  handled. 

SYN1AX: 

I  READER  I 

- DEVICE  - 1  DISK  I - . 

I  TAPE  I 


EXAMPLE: 


DEVICE  =  READER. 


If  the  Device  Type  is  TAPE,  the  following  statements  should  be  input, 
where  appropriate. 


la&e  ma£XLi_Caxa- 

The  tape's  origin  must  be  identified.  This  will  tell  the  system 
whether  the  tape  is  in  UT-2D  tape  format  or  a  "foreign  format". 

SYNTAX: 

I  UT  ”  I 

— —  ORIGIN  =  |----  . 

I  OTHER  I 


where  UT  =  The  tape  was  written  under 

UT-2D  control  (default). 

OTHER  =  The  tape  was  not  written  under 
UT-2D  control. 


EXAMPLE: 


ORIGIN  *  OTHER. 
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ULCLiEU-aiza  Caxd.i 

if  the  tape  origin  is  not  UT,  then  the  file's  physical  record 
size  must  be  given.  This  card  should  not  be  used  if  the  tape  origin  is 
UT. 


SYNTAX : 


PECOPD  =  — <n> - 


i 


where  n  =  The  decimal  Integer  value  of 
the  physical  record  length  in 
units  of  12-bit  bytes. 


EXAMPLE: 


PECOPD  =  100. 


I 

4 

( 

( 

f 

c 

{ 

f 

» 

( 

I 


flllLIlHLEL  CatfU 

The  system  must  be  notified  if  the  input  source  file  is  more 
than  one  reel  long.  The  default  is  single  reel. 

SYNTAX: 

I  YES  I 

— —  MULTIPEEL  - I  I- - . 

I  NO  I 

m  «* 

where  YES  =  The  file  is  more  than  1  reel  lonq. 

no  =  The  tile  is  1  reel  long  (default). 


EXAMPLE: 


MULTIREEL  =  YES 
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The  tape's  density  must  be  input  if  it  is  not 
of  55b  BPi. 


syntax: 


I  LO  I 

- - DENSITY  - - 1  HI  I - . 

I  HY  I 


LO 

=  220 

BPI 

hi 

=  55b 

BPI 

HY 

=  800 

BPI 

example: 


DENSITY  =  LO. 


CULLLLkliL  Ai-JL£i  £axitJI  Caxdi 


The  option  of 
teen  encountered  is 
run  if  a  parity  error 


continuing  processing  after  a 
availlble.  The  default  option 
occurs . 


SYNTAX: 


I  YES  I 

----  CONTINUE  =  ---I  I---  . 

I  NO  | 


where  YES  =  Processing  will  continue  regardless 
of  num  of  parity  errors. 

NO  =  Processing  will  halt  on  occurrance  of 
first  parity  error,  (default) 


example: 


CONTINUE  =  YES. 


the  default  value 


parity  error  has 
s  to  terminate  the 
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EXAMPLES  Of  STORAGE  and  PHYSICAL  DESCRIPTIONS: 

1.  Source  tile  is  a  card  deck. 


*♦  SOURCE. 

FILE  =  NONE. 
DEVICE  =  READER. 


II.  source  file  is  a  single  reel,  UT  produced  tape. 


**  SOURCE. 

FILE  =  DATA2/1 346/1441 . 
DEVICE  =  TAPE. 

DENSITY  =  HI. 

ORIGIN  =  UT. 

CONTINUE  =  YES. 


III.  Source  tile  is  a  foreign  multireel  taoe.  Each  record  contains 
1440  bits. 


♦*  SOURCE. 

I ILE  =  NONE/ 1334/1234. 
DEVICE  =  TAPE. 

DENSITY  =  HY. 

ORIGIN  =  OTHER. 

RECORD  =  1/0. 

MULTIREEL  =  YES. 


LOGICAL  DESCRIPTION  of  the  SOURCE  FILE: 


The  logical  description  of  the  file  is,  essentially,  its  file 
layout.  Since  only  fixed  length  records  may  be  defined,  the  description 
language  is  quite  straight  forward.  All  fields  must  be  identified  with 
an  "S"  and  a  uniaue  integer,  starting  with  1  and  incrementing  by  1.  To 
descriDe  the  field's  contents,  an  editing  identifier  is  used,  followed 
by  the  field's  size.  For  example,  if  the  first  source  field  were 
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-name-,  a  25  alphanumeric  character  field,  the  source  definition  would 
be  SI  A25.  This  syntax  is  very  similiar  to  FORTRAN  FORMAT  conversion 
and  eaiting  specifications.  If  a  field  or  group  of  fields  repeat 
themselves  in  the  file  layout,  the  REPEAT  <n>  BEGIN  ...  END  verbs  may  be 
used.  The  fields  described  between  the  BEGIN  and  END  statements  will  be 
repeated  n  times.  when  using  the  REPEAT  verb,  the  REPEAT,  the  field 
description  statements ,  and  the  END  card  must  all  be  on  seperate  cards. 

syntax: 


I--  REPEAT  <n>  BEGIN  -- 1  |--<tield  desc>--l  I—  END  -- 1 
I  II  III 

I . I  — I . I  — I . I--. 


where 

<tield  aesc>::=  ---  s  <sn> 


I 

0 

z 

A 

F 


--<field  wd>--- 


<n>  =  The  numoer  of  times  the  fields  are  to  be 

repeated. 

<sn>  =  The  unique  field  number,  starting  with  1, 

incrementing  by  1. 

< f 1 o  wd>  =  This  integer  represents  the  number 
of  bytes  in  the  field.  If  the 
editing  specification  is  F,  the 
number  of  digits  to  the  right  of 
the  decimal  point  must  be  input, 
ie,  9.3. 


Editinq  Specifications: 

I  =  Field’s  bvtes  represent  an  integer. 

0  =  Field’s  bytes  represent  3  bit  octal  integers. 

Z  =  Field's  bytes  represent  4  bit  hexidecimal  lnt. 

A  =  Field  contains  alphanumeric  characters. 

F  =  Field  is  a  decimal  number.  Number  of 

decimal  characters  must  be  qiven  in  the 
<tld  wd>  specification  .  ie.  9.3  would 
mean  the  field  is  9  bytes  long,  with  3 
digits  to  the  right  of  the  decimal  point. 
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EXAMPLE: 

Input  Pecora-- 

I - I--I . I - | I--I |  —  | I  — I 

I  jon  1291  3  .  b  0  I  264|bl41B|  3  I  S>  1  4  4  A  I  2151 1  «>  A  1101 

| - |--| - | - | I -- I I  — I I  — I 


LUG1CAL  DbSCPIPTION — 


51  Ab. 

52  12. 

53  E5.2. 

54  U4. 

REPEAT  3  bEGlM. 
SS  A5. 

Sb  12. 

END. 


Name  Cthis  is  a  comment  field) 

Age 

Salary  per  hr. 

Days  worked 
Start  -SKILLS-  PG 
Skill  code. 

Num  years  experience  at  this  skill. 
Ena  -SKILLS-  PG 
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I 

I 

I 

I 


f 

I 


f 

► 


f 


S2K  Target  Input 

2.1)  TARGET  S2K  DATABASE  INPUT 

PURPOSE : 


Tne  purpose  of  inputting  the  S2K  database  definition  is  to 
define,  tor  the  program,  tne  target  database.  The  cards  submitted  here 
must  oe  the  same  define  cards  usea  to  define  the  S2K  database.  These 
include  the  database  name  and  database  password  declaration  cards. 
Since  these  inputs  should  Since  these  inputs  should  have  already  been 
submitted  to  the  S2K  system,  they  are  assumed  to  be  syntactically  and 
semantically  correct.  If  the  user  described  his  database  interactively 
ana  does  not  have  card  input,  a  proper  deck  mav  be  produced  by 

1.  Charge  the  S2K  REPORT  tile  to  a  temporary  disk  file. 

2.  Issue  a  Dfc SCRIBE  command. 

J.  Dump  the  temporary  disk  file  to  PUNCH . 


S2K  DA  f  ABASE  DEFINITION  INPUT  EXAMPLE: 


**  S2K  . 

<  S2K  database  definition  card  deck  > 


i 

t 

« 
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2.E  CONVERSION  DEFINITION  LANGUAGE 


PURPOSE: 


The  purpose  ot  the  Conversion  Definition  Language  is  to  describe 
how  each  tarqet  field's  data  value  is  derived  and  whether  there  are  any 
conversion  cr  validation  procedures  to  De  applied  to  it. 

GENERAL  HULKS  ana  RESTRICTIONS: 

1.  The  first  input  statement  must  De  ♦*  CONVERSION. 

2.  tacn  conversior  statement  must  oe  contained  on  a  single  card,  one  to 
a  cara,  enriinq  with  a  Period.  Text  after  the  period  is  treated  as  a 
comment . 

3.  Only  component  numbers  may  be  used  when  referring  to  the  tarqet 
fielus.  no  component  names  may  be  used. 

4.  No  moving  of  source  to  target  data  will  take  place  without  an 
explicit  conversion  statement.  Therefore,  there  must  be  at  least  one 
conversion  statement  tor  each  defined  target  tile  field. 

b .  The  system  makes  no  semantic  analysis  ot  the  conversion  statements. 
' ■ -  use:  must  ensure  the  logical  correctness  ot  its  statements. 

•  • »  .•  ior,  «ill  re  in  the  order  ot  the  inputted  conversion  statements. 

s*  „  »  ro  ensure  all  desired  data  transformations  are  stated 

.  .> T i iwt  statement  is  input. 


I 
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LANGUAGE  DESCRIPTION: 


1 

« 

« 

« 

I 


mere  are  7  aifferent  conversion  statements,  grouped  into  three 
categories:  Data  Transformations,  Conversion  and  Validation  Operations, 

anti  tne  special  STORE  statement. 

DATA  TRANSFORMATION  STATEMENTS: 

The  basic  transformations  required  to  restructure  hierarchically 
modelled  data  structures  are: 


1.  Latera l--Move  values  from  source  to  target  fields  which  are  on 
the  same  cor responoing  level. 

2.  uown— Move  a  single  source  value  into  each  of  its  lower  level 
(descendant!  target  fields. 

j.  up-- vove  a  single  occurrence  of  a  lower  level  source  repeating 
group  field  up  to  a  target  entity,  or  perform  an  operation  on  all 
members  of  the  source  repeating  group  field  and  move  this  sinqie  result 
up  to  the  taroet  field. 


I 


* 


The  central  concept  necessary  to  comprehend  data  transformations 
is  tnat  of  "correspondence"  between  the  source  and  target  data  group 
levels.  Although  a  source  and  a  target  group  may  be  on  different 
heirarchical  levels,  they  may  be  in  correspondence.  A  more  formal 
definition  of  this  concept  is: 


correspondence:  A  target  group  x  corresponds  to  a  source 
group  Y  it  for  every  group  Instance  in  X  there  exists 
a  unique  aroup  instance  in  Y . 


f 


Consider  tne  following  source  and  target  file  descriptions 
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SOURCE  FILL  DESCRIPTION 


I  DEPT-NAME  i  dept-addr  I  employee  RG  I 


51  A10. 

52  A20. 

REPEAT  10  BEGIN. 

S3  A  1 5 . 

S  4  12. 

REPEAT  3  BEGIN. 
Sb  Ab. 

Sb  12. 

END. 

REPEAT  5  BEGIN. 
S7  Alb. 

SB  12. 

S9  E 9 . 2 . 

END. 

END . 


Department  name. 

Department  address. 

Start  EMPLOYEE  RG  (max  10  sets) 
Employee  Name. 

Employee  Age. 

Start  SKILL  RG  (max  3  sets). 
Skill  code, 

years  experience  at  this  skill. 
End  of  SKILL  RG. 

Start  JOB-HlST  RG. 

Company  name. 

Year  started  with  company. 
Yearly  Salary. 

End  of  JOB-HIST  RG. 

End  EMPLOYEE  RG. 


TARGEI  IS2K )  FILE  DESCRIPTION 


I  EMPL-NAME  I  AGE  I  DEPT  I  X-SKILLS  I  PRIM-SKILL  I  JOBH1ST  RG  I 

7 

C7  COMPANY  I  SALARY  I  YLAR  I 


l* 

EMPL-NAME(NAMF  X  (  1 5  )  ) 

1 

2  * 

EMPL-AGE  (INTEGER  9(2)) 

i* 

PRESENT-DEPT  (NAME  X(10)) 

r- 

4* 

NUM-OF-SKILLS  (NON-KEY  INTEGER 

9) 

V 

b* 

PRIMARY-SKILL  (NAME  X(S)) 

b* 

JUBHIST  (RG) 

/ 

7* 

COMPANY  (NON-KEY  NAME  X(15)  IN 

b) 

H* 

SALARY  (NON-KEY  MONEY  9(9). 99 

IN  b) 

9* 

YEAR-STARTED  (NON-KEY  INTEGER 

99  in 

/ 
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The  source  file  contains  three  hierarchical  levels,  while  the 
target  file  contains  two.  The  source  file's  second  level  (EMPLOYEE 

record)  corresponds  to  the  target's  first  level  (EMPLOYEE).  Thus, 

desired  data  on  the  first  source  level  would  have  to  be  moved  "down"  to 
tne  tarqet  file  (i.e.  DEPT-NAME  to  DEPT).  Likewise,  desired  level  3 
source  data  would  either  be  moved  up  (i.e.  NUM-OF-SK I LLS )  or  moved 
laterally  across  (i.e.  all  fields  in  JOBHIST  repeating  group). 
Specific  examples  of  these  operations  are  presented  with  the  discussions 
of  each  data  transformation  operation. 

Li.Et.Ci.  EcateffiEiai 

DIPECT  will  perform  the  "lateral"  data  transformation,  as 

previously  defined.  both  source  and  target  fields  must  be  on  the  same 

corresponding  levels.  If  a  constant  value  is  desired  in  a  target  field, 
the  constant  may  be  input  in  place  of  a  source  component  number.  This 
should  be  a  leqal  FORTRAN  constant,  i.e.  nH  should  precede  character 
typed  data. 

S  Y  n  l  a  A : 

t  <constant>  I 
--  DIRECT  --I  I 

I  <src  #>  I 


wnere  <src  *>  *  A  component  number  from  the 

source  file  definition. 

<constant>  =  Any  leqal  value  which  will  be  placed  in 
all  occurrences  of  the  tarqet  field. 
<temp>  =  A  FORTRAN  variable  used  in  a  user 

written  CONVERSION  statement. 

<trqt  #>  =  A  comoonent  number  from  the  target 

S2K  database  definition. 


I -<temp>- I 
I  I 

, - -  I  --  to  -«<trgt  #>--  . 
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EXAMPLES : 

DIRECT  S3  TO  Cl.  Employee  Name. 

DIRECT  S7  TO  C7.  Company  name  (J0BH1ST  HG). 

DIRECT  S9  TEMPREAL  TO  C8.  Converted  salary. 

DIRECT  9999.99  To  C8.  Constant  put  in  salary. 


I 


HEELA1  &LaX£ffi£fi£* 


REPEAT  will  perform  tne  "down"  data  transformation,  as 
previously  defined.  The  source  field,  which  is  at  a  higher 
corresponding  level  then  the  target  field,  will  be  moved  "down"  to  the 
target  field. 


SYNTAX : 

I -<temp>- I 
I  I 

- - -  REPEAT  --<src  *>— | ———————— | ——in  --<trqt  #>--  . 


where  <src  #  >  =  A  component  number  from  the 

source  file  definition. 

<temp>  =  A  FORTRAN  variable  used  in  a  user 
written  CONVERSION  statement. 
<trgt  *>  =  A  component  number  from  the 
S2K  database  definition. 


examples: 

REPEAT  Si  IN  C3.  Dept-name. 

REPEAT  SI  TEMPINT  IN  C3.  Converted  Dept-name. 


( 
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LEKLUIE  5I^X£m£Ci.i 

LEVELUP  performs  the  "up"  data  transformation  for  the  case  of 
moving  a  specific  occurrence  of  a  source  repeating  group  field  up  to  the 
target  field.  The  specific  occurrence  is  indicated  by  the  clause 
"l=<nl,n2,...nn>".  The  values  of  <nl,n2,...>  represent  the  specific 
occurrences  of  each  of  the  field's  ancestor  data  sets,  with  the  last 
value  <nn>  representing  the  occurrence  of  the  field's  repeating  group 
itself,  me  order  of  "n"  should  be  input  in  the  same  hierarchical  order 
as  the  database  schema.  Therefore,  the  first  occurrence  of  a  field 
which  is  two  levels  down  would  be  indicated  by  "I=n,l",  where  "n" 
represents  the  occurrence  of  the  field's  parent.  If  the  parent's  third 
occurrence  is  desired,  "1=3,1"  would  be  the  proper  input.  In  most 
cases,  the  LEVELUP  transformation  will  move  source  fields  that  are  only 
one  level  away.  For  example,  the  proper  input  for  the  second  occurrence 


of  a  repeating 

group  only 

one 

level  down  would  be 

"1  = 

2". 

All 

occurrences  rrus 

t 

be  referenced 

by  an  integer  except 

the 

"last" 

occurrence  in 

a 

repeating 

group 

.  Since  the  last  occurr 

ence 

may  be  a 

different  relati 

ve 

number  for 

each 

set,  the  term  "LAST"  may 

be 

used 

in 

Place  of  the  integer  "n". 


SYMiAX: 

I -<temp>- l 


I  I 

--  LEVELUP  --<src  »>--  I  =  <n 1 , n 2 , . . . nn>  -- | -------- | --  jo  --<trgt  * . 


where 


<src 

#>  = 

A  component  number  from 
source  definition. 

the 

<temp 

>  = 

A  FORTRAN  variable  used  in  a  user 
written  CONVERSION  statement. 

<trgt 

*>  = 

A  component  number  from 
S2K  database  definition. 

the  target 

<nl  ,n 

KJ 

• 

• 

V 

II 

The  relative  occurrence 
ancestor  groups  and  the 

number  tor 
groups  the 

the  field's 
field  resides 

in.  "n"  may  also  be  the  term  "LAST". 
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EXAMPLE: 

LEVELUP  S5  1=1  TO  C5.  First  Skill  Code. 

LEVELUP  S5  I =L AST  TO  C5.  Oldest  (last)  Skill  Code. 

LEVELUP  S5  1=1  TEMPCHAR  TO  C5.  Converted  Skill  Code. 


££il£  Slaxeameat.: 


UPOP  (Up-operation)  performs  the  "up"  data  transformation  for 
the  case  where  an  operation  is  performed  on  all  occurrences  of  a  source 
reoeating  group  field,  deriving  a  single  result  from  the  operation. 
This  single  result  is  then  moved  up  to  the  target  field.  The  avallible 
operations  include  MAX,  MIN,  AVG,  COUNT  and  TOTAL. 


syntax: 

I  MAX 
I  MIN 

--  UPOP  --I  AVG 

1  COUNT 
I  TOTAL 


where  max  =  The  source  field's  largest  value  in  the  RG. 

min  =  The  source  field's  smallest  value  in  the  RG. 

AVG  =  The  source  field's  average  value. 

COUNT  =  The  number  of  sets  in  the  source  RG  where  the 

source  field's  value  is  anything  but  -null-. 

TOTAL  *  The  total  of  the  source  field's  values  in  the  RG. 


EXAMPLE: 

UPOP  COUNT  S5  TO  C4.  Number  of  skills 


I -<temp>- | 

I  I 

■-<src  #>—  I -------- | —  to  --<trgt  #>--. 
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validation  and  conversion  statements: 

i 

1 

V- 

I  occasionally  the  source  data  values  are  not  in  the  format  or 

content  desired  for  the  target  record.  Also,  editing  of  the  source 
®  input  is  sometimes  desired  to  provide  increased  data  integrity.  These 

|  two  capabilities  are  provided  by  the  CQNVERSON  and  VALIDATE  statements. 

Because  it  is  impossible  to  predict  the  type  of  conversion  or  validation 
®  routine  a  user  may  need,  these  statements  only  provide  the  means  for  the 

*  user  to  write  the  actual  conversion/validation  code  necessary.  The  user 

written  code  is  incorporated  in  the  generated  FORTRAN  program sunaltered 
(except  for  the  source  component  number).  Thus,  the  user  must  adhere  to 
proper  syntax,  column  spacing,  etc. 


I 

f 

i. 

< 

( 

c 

t 

c 

r 

( 

/ 


casikLiii*i.iiU-Zl.aL&mec  ti 

The  CONVERSION  statement  gives  the  user  the  capability  of 
inputting  FORTRAN  source  code  which  will  execute  desired  conversions  on 
source  fields.  If  possible,  the  results  of  the  conversion  should  be 
placed  in  the  original  source  field,  if  however  the  result  is  a  value 
which  win  not  legally  fit  in  the  original  source  field,  the  result 
should  be  placed  in  a  temporary  variable.  This  temporary  variable 

should  be  one  of  the  following,  depending  on  the  type  of  the  result: 

integer  TEMPINT 

real  TEMPREAL 

character  TEMPCHAR ( 1 • . N) 

If  the  result  is  character,  left  justify  the  characters,  (10  characters 
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per  word),  in  the  variaole  "TEMPCHAR"  startinq  with  index  number  1.  The 
proper  TEMPCHAR  subscripts  must  be  used  in  the  user  written  FORTRAN 
code.  However,  no  subscript  should  be  input  when  referring  to  TEMPCHAR 
in  a  data  traNasf  ormation  statement.  Subscript  "1"  wilioe  assumed. 


Since  the  CONVERSION  statement  only  alters  the  value  of  the 
source  field,  a  data  transformation  statement  must  be  used  to  actually 
move  the  converted  value  to  a  target  field. *  If  a  temporary  variable  is 
used,  both  the  original  source  field  component  number  and  the  temporary 
name  must  oe  included  in  the  transformation  statement.  The  temporary 
name  must  be  in  the  statement  to  tell  the  system  the  source  value  is  in 
the  temporary  and  not  the  source  field. 


syntax: 


— CONVERSION  BEGIN.  -  — 
----<user  written  FORTRAN  code>---- 


mm mm  £ND.  mmm. 


EXAMPLES: 


CONVERSION  BEGIN. 

CCC 

C  Add  1  year  to  each  employee-age . 
CCC 

TEMPINT  =  S4  ♦  1 

ENO. 

DIRECT  TEMPINT  TO  C2. 


CONVERSION  BEGIN. 

CCC 

C  Change  all  "5bblB"  skill  codes  to  "9661B" 
CCC 

IF  (SS.E0.5H5661B)  S5=5H9bblB 

END. 

LEVELUP  55  1*1  TO  C5. 


C 
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JtALjJ3AJ£ 

VALIDATE  gives  the  user  trie  capability  of  validating  a 
particular  source  field  value  before  it  is  stored  in  the  database.  The 
user  must  write  the  validation  code,  just  as  is  necessary  for  the 
Conversion  statement.  somewhere  within  the  user  written  code  the 
FOKIRAn  variable  FAIL  must  be  set  to  TRUE  or  FALSE.  If  the  validation 
of  the  source  field  fails  (i.e.  FAIL  =  TRUE),  the  user  may  choose  to 
reject  the  data  set  being  processed  ( REJSET )  or  put  nulls  in  the  source 
fiela  and  continue  processing  the  data  set  (REJFLD).  It  the  user  wishes 
no  action  to  be  taken  on  a  validation  failure,  the  Conversion  statement 
should  be  used  instead  of  the  Validation  statement.  No  data 
transformation  operation  is  associated  with  the  validation  statement,  as 
is  the  case  with  the  Conversion  statement.  Thus,  the  proper  data 
transformation  operation  must  be  input  following  the  validation 
statement . 

SYNTAX: 

-  VALIDATE  BEGIN.  — - 

----<user  written  FORTRAN  code>---- 

I  REJSET  I 

- END  FAIL  =  ——I  I - . 

I  REJFLD  I 

EXAMPLE: 

VALIDATE  BEGIN. 

CCC 

C  Validate  AGE--  1 8<  =  AGE<  =  75 

CCC 

FAIL=. FALSE. 

IF  (S4.LS.18  .OF.  54.GT.75)  FAIL*. TRUE. 

END  FAIL  =  REJFLD. 

DIRECT  54  TO  C2. 


Conversion  Definition  Language 


86 


5  lUiifc-Slai  ea&pti. 

S2K  data  sets  must  be  built  in  the  hierarchical  order  that  they 
are  defined.  A  Level  0  data  set  must  be  created  before  its  descendant 
data  sets  may  be  "attached"  to  it.  Data  sets  are  created  by  loading  the 
fields  with  the  desired  data  and  "storing"  the  data  set.  using  this 
system,  the  user  loads  the  data  fields  using  the  data  transformation 
statements  previously  defined.  He  must  also  "store"  the  data  set  oy 
inputting  the  STORE  statement.  These  statements  should  be  input 
immediately  after  the  data  transformation  statement  for  the  last  field 
in  each  data  set.  The  set  name  should  be  the  same  as  that  used  in  the 
S2K  database  definition.  Use  the  name  "LEVELO"  for  the  Level  0  set 
name . 


syntax: 

STORE  -*-<data  set  namo---  . 


where  <data  set  name>  =  The  name  of  the  data  set,  as  defined 

in  the  S2K  databaseinput . 


EXAMPLES: 

STORE  LEVELO. 
STORE  JOBH1ST . 


SECTION  3  —  SYSTEM  USAGE; 


This  section  will  qive  instructions  on  how  to  generate  a  FORTRAN 
program,  how  to  review  and  modify  it,  and  how  to  execute  it.  a  complete 
example  input  and  resulting  generated  program  is  then  presented. 

3. A  HOw  TO  GENERATE  A  FORTRAN  PROGRAM 

To  generate  a  FORTRAN  program,  the  user  must  first  have  a  card 
decic  containing  the  required  Command  languaqe  input,  Source  and  Target 
tile  descriptions,  and  the  Conversion  lanquaqe  inputs,  (see  Sections  1 
and  2  of  this  manual).  This  deck  win  serve  as  input  to  the  system. 
The  complete  lob  set-up  is  shown  below.  Note  that  the  user  is  required 
to  input  only  a  single  command  card  with  the  input  card  deck.  All  of 
the  other  commands  needed  are  supplied  by  the  source  to  S2K  system. 

Job  Set-up  tor  Generating  a  FORTRAN  Program: 

<user  id> 

<password  card> 

<run  card>  (optional) 

PEADCCF ,  9294,  GENPATE 

7/8/9  (multi-punched) 

<users  complete  input  card  deck> 

6/7/8/g  (multi-punched) 

The  user  will  receive  output  from  the  system  showing  what  was  input  and 
any  error  messages.  it  there  were  no  errors,  a  FORTRAN  program  is 
generated  and  passed  to  the  compiler.  The  user  will  then  receive  the 


compiler  output 


It  is  possible  to  have  FORTRAN  syntax  errors  in  the 
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generated  program  due  to  erroneous  user  input  the  Source  to  S2K  system 
did  not  find.  The  user  may  correct  these  errors  in  two  ways.  The  first 
is  to  correct  the  original  input  ana  generate  a  new  FORTRAN  program. 
The  second  is  to  modify  the  generated  FORTRAN  program  itself.  This 
procedure  is  described  in  Section  3.B. 


3 . H  HOW  TO  MODIFY  THE  GENERATED  FORTRAN  PROGRAM 

Each  aenerated  FORTRAN  program  must  have  a  unique  name, 
otnerwise  different  users  would  be  erasing  each  others's  files.  Thus, 
the  name  of  the  generated  FORTRAN  program  is  the  first  four  letters  of 
the  aatabase  name  followed  by  the  letters  MSPC"  (for  source).  For 
example,  if  the  database  name  is  EXAMPL1 ,  the  generated  FORTRAN  program 
would  be  stored  under  tile  name  " EXAMSRC " .  All  source  files  are  stored 
on  permanent  library  9294.  Thus,  in  order  to  edit  the  FORTRAN  program 
tor  the  EAAMPL 1  database,  the  command 

READPF,  9294,  EXAMPL1 

is  all  that  is  needed.  The  user  can  then  modify  this  tile  using  the 
0T-2D  editor  EDIT.  If  batch  editing  is  required,  dumping  the  file  to 
punch  will  produce  a  card  deck  of  the  source  FORTRAN  program. 


3.C  HOW  TU  EXECUTE  THE  GENERATED  PROGRAM 


Once  the  generated  program  is  free  of  errors,  the  user  is  ready 
to  perform  the  actual  conversion.  A  file  containing  all  of  the  required 
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control  commands  for  each  job  is  generated  by  the  Source  to  S2K  System 
at  the  same  time  it  generates  the  FORTRAN  program.  Since  the  file  must 
have  a  unique  name,  it  is  made  up  of  the  letters  "LX"  (for  execute) 
followed  by  the  first  four  letters  of  the  database  name.  Using  the 
example  database  LXAMPL1,  the  generated  command  file  name  would  be 
"KXtXAM".  This  file  will  also  be  stored  on  permanent  library  9294.  The 
only  input  needed  to  execute  the  generated  conversion  program  is  shown 
below.  if  the  source  input  is  on  cards,  they  should  be  included  in  the 
dec*  after  the  7/6/9  multi-punched  card. 

Jon  bet-up  for  Fxecuting  the  Generated  FORTRAN  Program 

<  u  s  e  r  i  d  > 

<passwor d  card> 

<run  card>  (optional) 

RLAUCCF  ,  9294,  LXF.XAV  (file  name  will  be  different  for  each  job) 

7/R/9  (multi-punched  card) 

<if  source  input  is  cards,  input  them  here> 
b/7/H/9  (multi-punched  card) 

Lxecution  of  the  above  job  decK  would  result  in  execution  of  the 
generated  FORTRAN  program  as  well  as  saving  the  S2K  database  on  the 
permanent  library  (as  directed  by  the  LOCATION  card,  see  Section  2.B). 
Should  the  user  need  to  modify  the  generated  command  file,  it  may  be 
done  in  the  same  manner  as  modifying  the  generated  FORTRAN  program  (see 
Section  3.B). 
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3.D  CUMPLtTE  EXAMPLE 


This  example  will  use  the  source  and  target  databases  described 
in  section  2.E.  The  system  will  generate  a  program  to  convert  the 
source  input  file  "DBINPUT"  to  the  S2K  database  "EXAMPLl".  The  S2K 
database  description  is  stored  in  file  "EX1DESC"  on  permanent  library 
9899  (password  1221).  The  source  file  (DBINPUT)  is  a  disk  file  on 
permanent  library  6656  (password  1334).  A  null  field  will  be  indicated 
by  the  blank  character. 


Job  input  tor  Database  EXAMPLl. 

**  COMMAND. 

LOCATION  =  EX1DESC/9899/1221  . 

PUN  =  K, 

TYPE  =  1. 

EMPTY  =  . 

**  SOUHCE. 

FILE  =  DBINPUT/bbS6/1334. 

DEVICE  *  DISK. 

51  A10. 

52  A20, 

REPEAT  5  BEGIN. 

53  A 1 5 . 

54  12. 

REPEAT  3  BEGIN. 

S5  A5. 

Sb  12. 

END. 

REPEAT  3  BEGIN. 

57  A 1 5  . 

58  12. 

59  19.2. 

END. 

END. 

**  S2K. 

USER  »PASS1 
NEW  DATA  BASE  IS  EXAMPLl 
1*  EMPL-N AME ( NAME  X(15)) 

2*  EMPL-AGE  (INTEGER  9(2)) 

3*  PRESENT-DEPT  (NAME  X(10)) 

4*  NUM-Uf -SKILLS  ( NON-KEY  INTEGER  9) 

5*  PRIMARY-SKILL  (NAME  X(5)) 

6*  JOBH1ST  (RG) 

7*  COMPANY  (NON-KEY  NAME  X(15)  IN  6) 

8*  SALARY  (NON-KEY  MONEY  9(9). 99  IN  b) 


Department  name. 

Department  address. 

Start  EMPLOYEE  PG  (max  5  sets). 
Employee  name. 

Employee  age. 

Start  SKILL  RG  (max  3  sets). 
Skill  code. 

Years  exp.  at  this  skill. 

End  of  SKILL  RG. 

Start  JOBH1ST  RG. 

Company  name. 

Year  started  with  company. 
Yearly  salary. 

End  of  JOBHIST  RG. 

End  of  EMPLOYEE  RG. 
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9*  YEAR-STARTED  (NON-KEY 
*♦  conversion. 

DIRECT  S3  TO  Cl. 

DIRECT  S4  TO  C2. 

REPEAT  SI  IN  C3. 

UPOP  COUNT  S5  TO  C4. 
LEVELUP  S5  1=1  TO  C5. 

STOKE  LEVELO. 

DIRECT  S7  TO  C 7 . 

DIRECT  S9  TO  C8. 

DIRECT  S8  TO  C9. 

STOKE  JOBHIST. 


INTEGER  99  IN  6) 

(Employee  Name) 

(Employee  age) 

(Department  name) 

(Number  of  skills) 
(Primary  skill) 

(End  of  LEVELO  data  set) 
(History-company  name) 
(History-salary) 
(History-year  started) 
(End  of  JOBHIST  data  set) 


Tne  following  FORTRAN  program  was  generated  by  the  source  to  S2K 
Conversion  System  as  a  result  of  the  above  input. 


PROGRAM  PPOGtXl  (DBINPUT ,  OUTPUT,  TAPE 1 =DB 1 NPUT ) 

IMPLICIT  INTEGER  (A-Z) 

C 

C  ♦ 

C  *  THIS  PPOGFAM  WILL  READ  FILE  "DBINPUT"  AND  CONVERT 

C  *  IT  TO  THE  S2K  DATABASE  "EXAMPL1 " .  THE  NEWLY 

C  *  BUILT  DATAbASE  WILL  BE  STORED  UNDER  THE  FILENAME 

C  *  "EX1DESC"  ON  PERMANENT  FILE  NUMBER  9899. 

C  * 

C*************»*************** 

c 

C  --  COMMON  BLOCK  DECLARATION  -- 

♦PL  COMMBLOCK/EXAMPL1/  SCHNME,  RCODE,  FILLER,  LDSET,  PASSW,  NUMRG, 

♦PL  RGP05 ,  LEVEL,  TIMEX,  SDATE ,  CYCLE,  SEPSYM, 

♦PL  ENDTERM,  STATUSX. 

C 

C  SCHEMA  NAME  --  LEVELO  — 

♦PL  SCHEMA/LEVELO  OF  EXAMPL 1 /  C  1 C  2 ) ,  C2,  C3,  C4,  C5. 

C 

C  SCHEMA  NAME  --  JOBHIST  — 

♦PL  SCHEMA/JOBHIST  OF  EXAMPL 1 /  C 7 C  2 ) ,  C8,  C9. 

C 

♦pl  end  schemas. 

c 

C  —  PEAL  DECLARATIONS  FOP  SCHEMA  — 

PEAL  C8 ,  TEMPREAL,  AVG 
C 

C  —  GLOBAL  DECLARATIONS  -- 

DIMENSION  BUF (61 ) ,  SRC(108) 
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c 

C  —  PARAMETER  DECLARATIONS  — 

EMPTCHR  =  1 
C 

C  —  MISCELLANEOUS  FORMATS  — 

60  FORMAT !  X ,  *  INITIAL  LOAD  OF  THE  EXAMPL1  DATABASE*,//) 

62  FORMAT ( X , *  TIME  =  *,A10,/,X,*  DATE  =  *,A10,/) 

1300  FORMAT!//, X,*--  EOF  --*,40 

1320  FORMAT!//, X,*--  CLEARED  DATABASE.  CYCLE  *  *,I4) 

1500  FORMAT!//, X,*--  PARITY  ERROR  ON  LAST  READ.  BUF  =  *,/) 

1520  FORMAT!//, X,*—  FORMAT  ERROR  ON  LAST  DECODE.  BUF  =  *,/) 
5000  FORMAT!//, X,*--  SUMMARY  OF  INITIAL  LOAD  RUN  FOR  EXAMPL1 * ,  / ) 
5010  FORMAT (X,*NUNBER  OF  SOURCE  RECORDS  READ  =  *,I6) 

5020  FORMAT! //,X,*NUMBER  OF  I/O  ERRORS  =  *,I6) 

C 

C  —  INITIALIZE  LOCAL  DATA  — 

DO  55  1=1,108 
SRC (13  =  10H 
55  CONTINUE 
ERR  =  0 
ICNT  =  0 
C 

C  --  PRINT  INITIAL  PROGRAM  HEADER  -- 

PRINT  60 
CALL  TIME! I ,  J) 

CALL  DATE ! J ) 

PRINT  62,  I,  J 
C 

C  —  OPEN  DATABASE  — 

♦PL  START  S2K. 

PASSW  =  10HPASS1 
♦PL  OPEN  EXAMPLl. 

IF  IHCODE.EO.O  .AND.  STATUSX . EQ . 0 )  GOTO  70 
CALL  PFTEPR! 1 , 1 ,RCODE) 

GOTO  999 
C 

70  CONTINUE 
C 

C  —  PUT  IN  QUEUE  MODE  -- 

♦PL  QUEUE. 

C 

C  —  MAJOR  READ  LOOP  — 

100  CONTINUE 

ICNT  =  ICNT  ♦  1 
READ ! END=900 , 1 )  BUF 
C 

C  —  CHECK  FOR  PARITY  ON  LAST  READ  -- 

IF  UOCHEC!  1)  .NE.O)  GOTO  950 
C 

C  —  ECHO  PRINT  INPUT  IEVERY  TENTH  RECORD)  — 

1*mOD! ICNT, 10) 

IF  ! 1  .EQ.  0)  PRINT  ♦,  BUF 
C 

C  —  DECODE  STATEMENT  NUMBER  1  — 

NUMCbAR  =  120 


c 
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DECODE! .ERR. *960,  NUMCHAR,  2010,  BUF)  CSPC(l),  1=1,20) 
2010  FORMAT!A10,A20,A15,I2,A5,I2,A5,I2,A5,I2,A15,I2, 

-  F9.2,A15,12,K9.2) 

—  SHIFT  REST  OF  BUFFER  TO  WORD  ONE  — 

J  =  1 

DO  151  I  =  13,61 
BUFIJ)  =  BUF(I) 

J=J+1 

151  CONTINUE 


--  DECODE  STATEMENT  NUMBER  2  -- 
NUMCHAR  =  90 

DECODE ( .ERR. =960,  NUMCHAR,  2020,  BUF)  (SPC(I),  1=21,37) 
2020  F0RMAT(A15,12,F9.2,A15,I2, A5,I2, A5, I2,A5, I2,A15, 

-  I2.F9.2) 

--  SHIFT  PEST  OF  BUFFER  TO  WORD  ONE  -- 

J  =  1 

DO  152  I  =  22,61 
BUF(J)  =  BUF ( I ) 

J=J+1 

152  CONTINUE 


—  DECODE  STATEMENT  NUMBER  3  -- 
NUMCHAR  =  90 

DECODE! .ERR. =960,  NUMCHAR,  2030,  BUF)  (SPC(I),  1=38,54) 
2030  FORMAT (A15,I2,F9.2,A15,I2,F9,2,A15,I2»A5,I2,A5,I2, 

-  A5 , 1 2 ) 

--  SHIFT  REST  OF  BUFFER  TO  WORD  ONE  — 

J  =  1 

DO  153  I  =  31,61 
BUFIJ)  =  BUF  Cl) 

J=J*1 

153  CONTINUE 


—  DECODE  STATEMENT  NUMBER  4  -- 
NUMCHAR  =  100 

DECODE! .ERR. =960,  NUMCHAR,  2040,  BUF)  ISPC(I),  1=55,70) 
2040  FORMAT! A 1 5 , 1  2 , F9 . 2 , A 1 5 , 1 2 , F9 . 2 , A 1 5 , 1 2 , F9 . 2 , A 1 5 , 1 2 , A5 ) 

--  SHIFT  PEST  OF  BUFFER  TO  WORD  ONE  — 

J=1 

DO  154  I  =  41,61 
BUFIJ)  =  BUF ! 1 ) 

J=J  +  1 

154  CONTINUE 


--  DECODE  STATEMENT  NUMBER  5  -- 
NUMCHAP  =  130 

DECUDE !.EPP.*960,  NUMCHAR,  2050,  BUF)  (SPCtI),  1=71,95) 
2050  FORMAT! 12 , A5 , 1 2 , A5 , 1 2 , Al 5 , 12 , F9 . 2 , A1 5 , 1 2 , F9. 2 , A 1 5 , 

•  I2,F9.2,A15, I2,A5,I2,A5,I2,A5) 
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C  —  SHIFT  REST  OF  BUFFER  TO  WORD  ONE  — 

J  =  1 

DO  155  I  s  54,61 
BUF(J)  =  BUF(I) 

J=J  +  1 

155  CONTINUE 
C 

C  --  DECODE  STATEMENT  NUMBER  6  — 

NUMCHAP  *  80 

DECODE! .ERR. =960,  NUMCHAP,  2060,  BUF)  (SRC(I),  1=96,108) 
2060  F0RMAT(I2,A15,I2,F9.2,A15,I2,F9.2,A15,I2.F9.2) 

C 

C  --  CONVERSION  PROCESSING  -- 

C 

C  —  LEVELO  DATA  SET  — 

300  CONTINUE 

DO  320  12  =  1,5 

1NDEX2  =  4  ♦  (12-1)421 
Cl  =  SRC ( 1NDEX2 ) 

C 

1NDEX2  =  5  ♦  (12-1)421 
C  2  =  SRC ( I NCE  X  2 ) 

C 

INDEX  1  =  1 
C  3  =  SPC(INDEXl) 

C 

COUNT=0 

DU  330  13  =  1,3 

INDEX  3  =  7  +  (12-1)421  ♦  (13-1)42 
IF  (  SRC ( INDEX3 )  ,EQ.  EMPTCHR  )  GOTO  330 
COUNT=COUNT+l 
330  CONTINUE 
C  4  =  COUNT 
C 

INDEX  3  =  7  +  (12-1)421  +  (1-1)42 
Cb  =  SRC ( INDEX  3 ) 

C 

*PL  INSERT  LEVELO. 

IF  (RCODE.NE.O)  CALL  PR TERR (2,1, RCQDE ) 

C 

C  —  JOBHIST  DATA  SET  — 

400  CONTINUE 

DO  410  14  =  1,3 

INDEX4  =  28  ♦  (12-1)421  ♦  (14-1)412 
C7  =  SRC ( 1NDEX4 ) 

C 

1NDEX4  *  30  ♦  (12-1)421  ♦  (14-1)412 
C8  =  SRC ( 1NDEX4) 

C 

INDLX4  =  29  +  (12-1  )*21  ♦  (14-1)412 
C  9  =  SRC ( I NDEX4 ) 

C 

♦PL  INSERT  JOBHIST. 

IF  (RCODE.NE.O)  CALL  PR TERR (2,2, RCODE) 

410  CONTINUE 


_ 
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C 

( 

320 

c 

CONTINUE 

v. 

C 

--  FINISHED  WITH  THIS  RECORD.  LOOP 
GO  TO  100 

< 

C 

C 

C 

—  EOF  DETECTED  ON  LAST  READ  — 

9 

900 

C 

PRINT  1300 

C 

--  CLGSE  UP  DATABASE  — 

9 

♦  PL 

TERMINATE. 

IF  (RCODE.NE.O)  CALL  PPTERP ( 4 , 1 , RCODE) 

♦PL 

CLEAR. 

1 

PRINT  1320,  CYCLE 

l 

♦PL 

CLOSE  EXAMPL1. 

IF  (RCODE.NE.O)  CALL  PRTEPR ( 3 , 1 , RCODE ) 

♦  PL 

C 

END  PROCEDURE. 

GOTO  999 

C 

C 

--  PARITY  ERROR  DURING  LAST  READ  -■ 

f 

950 

PRINT  1500 

« 

PRINT  ♦,  BUF 

GOTO  100 

1 

C 

C 

—  FORMAT  ERROR  DURING  LAST  DECODE 

1 

960 

PRINT  1520 

PRINT  ♦,  BUF 

EHP=EPP*1 

GOTO  100 

f 

( 

( 

f 

( 

r 

c 

4 
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C 

C 

c 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


--  PRINT  JOB  SUMMARY  — 

999  CONTINUE 
PRINT  5000 
PRINT  5010,  1CNT 
PRINT  5020,  ERR 
END 

--  SUBROUTINE  PRTEBR  (PRINT  ERROR)  — 

THE  PARAMETERS  ARE: 

INST  :  INSTRUCTION  NUMBER,  WHERE 

lsOPEN,  2=INSERT ,  3*CLOSE#  4*TERMINATE 
LOC  J  THE  LOCATION  IN  THE  PROGRAM  THE  ERROR 
WAS  DETECTED. 

RTNC  t  THE  RETURN  CODE  THE  S2K  SYSTEM  RETURNED. 

SUBROUTINE  PPTERP (INST ,  LOC,  RTNC) 

9000  FORMAT (/,X,*  -----  DATABASE  ERROR  -  —  -*,/) 

9010  FORMAT(X, ’INSTRUCTION  *  ’,13,*  LOCATION  *  *.I3, 

-  *  PETURNCODE  »  ♦,!)) 
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PRINT  9000 

PRINT  9010,  INST,  LOC,  RTNC 

RETURN 

END 
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APPENDIX  B 

Generated  Command  File  Examples 

The  following  example  tile  listings  contain  UT-2D  commands 
generated  by  the  Source  to  S2K  Conversion  System. 


KILE  "GEN PATE" 

File  "GENPATE"  is  called  by  the  user  to  read  his  input 
and  generate  a  conversion  program. 

"GEnPATe " 's  commands,  as  listed  below,  are  the  same  for  each  user 
and  database. 

Kile  GKNPA1L 

KXECPF,  9294,  SRCTS2K 
PEADCCF,  9294,  COMPILC 


FILE  "COMPILC" 

File  "COMPILC"  is  called  by  file  "GENPATE"  during  the 
conversion  program  generation  phase.  This  file  is  unique  to  each 

101 


c 
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database  user.  The  database  name  for  this  example  is  "DUNS”. 
Note  the  name  of  the  generated  conversion  program  ( "DUNSSRC" )  and 
generated  execution  commands  file  ("EXDUNS"). 


File  CQ^PIIC 

REWIND  FORTSRC 
SORTPRG 

RENAME  FORTSRC  DUNSSRC 
SAVEPF ,  9294,  3217,  DUNSSPC 
RENAME  EXECC  KXDUNS 
SAVEPF,  9294,  3217,  EXDUNS 
REWIND  DUNSSRC 

PUBLIC,  PLF ,  1=DUNSSPC,  B=DUNSOBJ,  P,E=3 


FILE  "EXDUNS" 


File  "EXDUNS"  is  a  unique  file  and  unique  tile  name  for 
each  database  user.  Note  the  tile  recompiles  the  generated 
conversion  Program,  readies  the  database  files,  readies  the  input 
source  tile,  executes  the  generated  conversion  program  and 
finally,  saves  the  database.  The  following  data  is  relevent  to 
this  example: 


Database  Name:  DUNS 

iile  Name  Database  Is  Stored  Under:  DUNSDB 
Database  File  Library  Number  and  Password:  9299/3642 
Source  File  Name:  INFILL 
source  File  Device:  TAPE 

Source  File  Tape  Number  and  Password:  8868/lb48 

Source  File  Characteristics:  Foreign  Tape;  Dens ity=800BPI ; 

Record  Length  =  1200  bytes 
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Kile  "  F  X  D  U  NS" 


BKADPF ,  9294,  DUNSSRC 

PUBLIC,  PL t,  1=DUNSSRC,  B=DUNSOHJ,  P,E=3 
PKADPF ,  9299,  DUNSDB 
S2KWS,  DP,  DUNSDB 

PEQUFST,  INFILL,  8868/1648,  WO,  HI,  B,  1 00 . 
DUN SOB J 

S2KPS ,  DS,  DUNSDB 

SAVEPF ,  9299,  3b42,  DUNSDB 


I 


REI FRFNCES 


bAKKOM,  David  K. , 

"Implementation  of  a  Prototype  Generalized  File  Translator", 
Honeywell  Information  Systems,  Inc.,  Bxac*.  1 Hlh  ACM  SlCXULi  ini— 
rnnf  ,  an  management  at  Udtx,  San  Jose,  Cal.,  1975,  pp.  99-110. 

BUNFMAN,  Peter  U.,  et  al., 

"ASAP  to  PH?  Efficient  Relational  Data  Bases  from  very  Large 
Files",  University  of  Pennsylvania ,  Axxxi  Exxxxtta  lxcnaJLtxl 
Extati  kUzUAHsAll,  January  1975. 

runs  liner  LuLde., 

Systems  Development  corporation,  November  14,  1975. 


D  1 JKSTR A ,  L.W., 

"Tne  Humble  Pr ogr ammer  "  ,  Cammua. 
10,  PP.  859-866,  October  1972. 


ai  tux  A£L,  voi.  15,  No. 


(■LOYD,  H.w., 

"Assigninn  Meanings  to  Programs",  Etatxdutxs  ai.  Amxticxa 
Mathamat  ical  Spr.ie.ty  SyjBftasium*.  Annl  i  ed  iiainxmatict  ,  V  0 1 .  19, 
pp.  19-31,  19b7. 

I  Hi  ,  James  F . ,  et  al . , 

"A  Developmental  Model  for  Data  Trans  lation " ,  Ex.  at-  1222 
sif;pFinF.T  workshop  an  Uxtx  nesrr  i  r>»  i  on.  Accxsx  and  Lanital , 
Denver,  Colo.,  pp.  77-105,  1972. 

FRY,  James  P.,  et  al., 

"An  Approach  to  stored  Data  Definition  and  Translation", 
University  of  Michigan,  Ait  Latex  Oiiicx  at  Ltixatiilt  Bxxxaicd 
Exaati  Xl2z2212,  December  1972. 


(HJ  HUAHF,  C.A.F., 

"An  Axiomatic  Basis  for  Computer  Programming",  CamnmDicaiioat  at 
tax  ACil,  Vol.  12,  No.  10,  pp.  576-583,  October  19b9. 

1 9  J  JENSKN,  K . ,  and  WlRTH,  N., 

Pft.sr a i.  jjsxt  Manual  Report .  Sp r i nge r - Ve r  1  oa ,  New  York,  1974. 

MOJ  KOFHF,  G.J.,  et.  al., 

par  a  Management,  system*  Cxtxiaa,  The  Mitre  Corporation,  Bedford, 
Mass.,  January  1973. 


c 


V 


10b 


lllj  MERTEN,  Alar.  G. ,  et.al., 

"A  Data  Description  Language  Approach  to  File  Translation",  Data 
Translation  Project,  £taceebigas  ACM  SIC  MUD  workshop  an  Data 
t)estripf.ifliu  Access  abb  Coatra!  ,  Ann  Arbor.  Michigan,  pp. 
191-205,  May  1974. 

[ 1 2 j  MERTEN,  Alar  G., 

"A  Theoretical  Analysis  on  Data  Definition  and  Translation",  Air 
fr'nrre  Qllice  al  Mcied-iliC  Kpsearr h  Hecoct  liAsflbDA ,  December 
197b. 

till  NAVA1HE,  Shamtcant  B.,  et.  al., 

"Restructuring  for  Large  Databases:  Three  Levels  of 

Abstractions",  ACM  Transact iqds  oa  Dataware  sms terns ,  voi.  l,  Nu. 
2,  pp.  138-158,  June  197b. 

11 4  J  PARNAS,  D.L., 

"On  the  Criteria  To  Be  Used  In  Decomposing  systems  Into  Modules", 
COJAlflUbACdl  ions  at  tbe  ACM,  Vol.  15,  NO.  12,  pp.  14b-lbl, 
December  1972. 

[lbl  RAMER1Z,  J . A  . ,  et.  al.. 

Aula  malic  Leaeratiao  at  Data  conversinn  £rogtaas  Using  a  Mala 
Descriglioo  Laoouage*  Hals  1*.  11,  University  of  Pennsylvania, 

Philadelphia,  Pa.,  May  1973. 

(lbj  RAMERIZ,  J.A., 

"Automatic  Generation  of  Conversion  Programs  Using  a  Data 
Description  Language  (DDL)",  Ph.D.  Dissertation,  University  ot 
Pennsylvania,  1793. 

[ 1 7 J  rout ,  David  J. , 

"Converting  from  Rectangular  to  Relational  Data  Bases", 
university  of  Pennsylvania,  ot £±cp,  at  Marai  Be searcb  Began 
rUBaDA2r2J2,  September  1976. 

[lbj  SHUSHAM,  A., 

"A  Logical-Level  Approach  To  Data  Base  Conversion",  Systems 
Development  Corporation,  Etoceebiaas  1325  ACM  S1CMUD  Code  recce 
oo  Maoaemeol  al  Dora,  san  Jose,  cal.,  pp.  112-122,  1975. 

119]  SHU,  N.C.,  et.  al., 

"EXPRESS:  a  Data  Extraction,  Processing  and  REStr uctur ing 

system",  ACM  lraas.dc liars  or  Database  Systems,  voi.  2,  no.  2, 
pp.  134-174,  June  1977. 

(20)  WIRTH,  N., 

"Program  Development  by  Stepwise  Refinement",  f~ommupirftt:  ion*  al 
lb£  ACM,  Vol.  14,  No  4,  pp.  221-227,  April  1971. 


c 


V 


lOo 


r 


[  2  1 J 


YLH,  B.T.  ,  and  BASU,  S.K. , 
"Strong  Verification  of  Programs 

taoicefixloa .  voi.  st-i,  No.  3, 


PP.  339-346,  September 


SattJiaig 
1 97  S . 


» 

I 

I 


1 


i 

9 


VITA 


Jonathan  Lee  Stevens  was  born  in  Roswell,  hew  Mexico  on 
July  24,  1948,  the  son  of  jack  D.  and  Yvonne  K.  Stevens.  After 
graduatinq  from  Lakenheath  High  School,  Lakenheath  Enqland,  he 
attended  the  University  of  Washington,  Seattle  Washington,  for 
one  year.  He  then  received  a  Presidential  appointment  to  the 
United  States  Air  Force  Academy,  entering  in  June  19b7.  He 
graduated  with  the  degree  of  Bachelor  of  Science  in  Computer 
Science  and  the  rank  of  2nd  Lieutenant  in  the  United  states  Air 
Force.  From  August  1971  to  October  1973  he  was  assigned  with  the 
4629th  Support  SALE  Squadron  as  a  Computer  Programmer,  and  was 
promoted  to  1st  Lieutenant.  From  October  1973  to  August  1977  he 
worked  as  a  Systems  Analyst  at  the  Military  Personnel  center  and 
was  promoted  to  Captain,  in  August  1977  he  entered  the  Graduate 
School  of  the  University  of  Texas. 


Permanent  Address:  9ll7-l89th  Place,  s.w. 

Edmunds,  Washington,  98020 


107 


t 


:  •  V  .  .  •  .Vfc’ii  v  V^W  Vjf 


END 

DATE 

FILMED 


DTIC 


